Text-machina

Latest version: v0.2.12

Safety actively analyzes 638396 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 3

0.2.0

The 0.2.0 release of TextMachina includes:

- New providers: `Amazon Bedrock`, `AI21`, `Azure OpenAI`, and inference servers (`vllm` and `trt`).
- Refactor the Huggingface Remote provider to make retries through `HTTPAdapter`.
- Two new extractors for mixcase tasks: `sentence_masking` and `word_masking`. Differently from the `sentence_gap` and `word_gap` extractors, LLMs must reconstruct masks in whole texts, instead of writing text between boundaries.
- Extend the dataset generator for mixcase tasks to consider masking extractors.
- Add config examples to learn about the extractors.
- Small refactors: colors in logger, inheritance in some tokenizers, etc.

0.1.0

This release of TextMachina includes:

- Allow to pass parameters to the extractors out from the prompt templates. The templates must be used only to define placeholders.
- Add `MixCaseDatasetGenerator` to generate datasets for mixcase tasks (detection tagging). Other datasets like mixcase classification can be built out of TextMachina, using the datasets generated by this one.
- Add `sentence_gap` and `word_gap` extractors for mixcase tasks.
- Refactor interactive exploration. Now we have one class per task, and each one must build its own panels.
- Added exploration for mixcase datasets.
- Added a `TokenClassificationMetric` to evaluate HF models on mixcase and boundary tasks.
- Better structured and documented examples. Now we have `examples/learning` to illustrate how to use providers/tasks/extractors and `examples/use_cases` with additional config files.
- Minor changes to improve quality of life: force to pass `task_type` in the CLI to prevent potential confusions, disable `random_sample_human` on boundary detection tasks, etc.
- Document all the new code and improve existing documentation.
- Extend the README to talk about mixcase tasks, include figures to visualize each type of task.

0.0.10

- Updated Arxiv citation in README

0.0.9

First release 🎉

First release of TextMachina that includes:

- **Dataset generators**: for detection, attribution, and boundary detection tasks.
- Five **model providers**: Anthropic, Cohere, HuggingFace (local and remote), OpenAI, and Vertex AI.
- Six **extractors** to fill prompt templates: Auxiliary, Entities, Nouns, Sentence prefix, Word prefix, and Combined.
- One **decoding constrainer**: Length constrainer.
- Five **metrics** to assess task difficulty and dataset quality: MAUVE, Perplexity, Repetition, Diversity, and baseline models.
- **Post-processing functions** to improve the quality of the datasets and prevent common biases.
- **CLI interface** to generate and explore datasets.
- **Configuration examples**, under the folder `etc/examples`, to test different tasks and model providers.

Page 3 of 3

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.