Ktrain

Latest version: v0.41.4

Safety actively analyzes 682382 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 7 of 29

0.31.2

new:
- N/A

changed
- added `truncate_to` argument (default:5000) and `minchars` argument (default:3) argument to `KeywordExtractor.extract_keywords` method.
- added `score_by` argument to `KeywordExtractor.extract_keywords`. Default is `freqpos`, which means keywords are now ranked by a combination of frequency and position in document.


fixed:
- N/A

0.31.1

new:
- N/A

changed
- Allow for returning prediction probabilities when merging tokens in sequence-tagging (PR 445)
- added basic ML pipeline test to workflow using latest TensorFlow

fixed:
- N/A

0.31.0

new:
- The `text.ner.models.sequence_tagger` now supports word embeddings from non-BERT transformer models (e.g., `roberta-base`, `openai-gpt`). Thank to Niekvdplas.
- Custom tokenization can now be used in sequence-tagging even when using transformer word embeddings. See `custom_tokenizer` argument to `NERPredictor.predict`.

changed
- [**breaking change**] In the `text.ner.models.sequence_tagger` function, the `bilstm-bert` model is now called `bilstm-transformer` and the `bert_model` parameter has been renamed to `transformer_model`.
- [**breaking change**] The `syntok` package is now used as the default tokenizer for `NERPredictor` (sequence-tagging prediction). To use the tokenization scheme from older versions of ktrain, you can import the `re` and `string` packages and supply this function to the `custom_tokenizer` argument: `lambda s: re.compile(f"([{string.punctuation}“”¨«»®´·º½¾¿¡§£₤‘’])").sub(r" \1 ", s).split()`.
- Code base was reformatted using [black](https://github.com/psf/black)
- **ktrain** now supports TIKA for text extraction in the `text.textractor.TextExtractor` package with the `use_tika=True` argument as default. To use the old-style text extraction based on the `textract` package, you can supply `use_tika=False` to `TextExtractor`.
- removed warning about sentence pair classification to avoid confusion

fixed:
- N/A

0.30.0

new:
- **ktrain** now supports simple, fast, and robust keyphrase extraction with the `ktran.text.kw.KeywordExtractor` module
- **ktrain** now only issues a warning if TensorFlow is not installed, insteading of halting and preventing further use. This means that
pre-trained PyTorch models (e.g., `text.zsl.ZeroShotClassifier`) and sklearn models (e.g., `text.eda.TopicModel`) in **ktrain** can now be used
**without** having TensorFlow installed.
- `text.qa.SimpleQA` and `text.qa.AnswerExtractor` now both support PyTorch with optional quantization (use `framework='pt'` for PyTorch version)
- `text.qa.SimpleQA` and `text.qa.AnswerExtractor` now both support a `quantize` argument that can speed up
- `text.zsl.ZeroShotClassifier`, `text.translation.Translator`, and `text.translation.EnglishTranslator` all support a `quantize` argument.
- pretrained image-captioning and object-detection via `transformers` is now supported

changed
- reorganized imports
- localized seqeval
- The `half` parameter to `text.translation.Translator`, and `text.translation.EnglishTranslator` was changed to `quantize` and now supports
both CPU and GPU.

fixed:
- N/A

0.29.3

new:
- `NERPredictor.predict` now includes a `return_offsets` parameter. If True, the results will include character offsets of predicted entities.

changed
- In `eda.TopicModel`, changed `lda_max_iter` to `max_iter` and `nmf_alpha` to `alpha`
- Added `show_counts` parameter to `TopicModel.get_topics` method
- Changed `qa.core._process_question` to `qa.core.process_question`
- In `qa.core`, added `remove_english_stopwords` and `and_np` parameters to `process_question`
- The `valley` learning rate suggestion is now returned in `learner.lr_estimate` and `learner.lr_plot` (when `suggest=True` supplied to `learner.lr_plot`)

fixed:
- save `TransformerEmbedding` model, tokenizer, and configuration when saving `NERPredictor` and reset `te_model` to
facilitate loading NERPredictors with BERT embeddings offline (423)
- switched from `keras2onnx` to `tf2onnx`, which supports newer versions of TensorFlow

0.29.2

new:
- N/A

changed
- N/A

fixed:
- added `get_tokenizer` call to `TransformersPreprocessor._load_pretrained` to address issue 416

Page 7 of 29

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.