Spark-nlp

Latest version: v5.5.1

Safety actively analyzes 685670 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 23

5.1.4

========
----------------
New Features & Enhancements
----------------
* **NEW:** Introduceding the `DocumentCharacterTextSplitter` which allows users to split large documents into smaller chunks. `DocumentCharacterTextSplitter` takes a list of separators in order and splits subtexts if they are over the chunk length, considering optional overlap of the chunks.
* **NEW:** Introducing support for ONNX Runtime in RobertaForSequenceClassification annotator
* **NEW:** Introducing support for ONNX Runtime in RobertaForTokenClassification annotator
* **NEW:** Introducing support for ONNX Runtime in RobertaForQuestionAnswering annotator
* Adding an example to load a model directly from Azure using .load() method. This example helps users to understand how to set Spark NLP to load models from Azure

----------------
Bug Fixes
----------------
* Fix a bug with in `Whisper` annotator, that would not allow every model to be imported
* Fix BPE Tokenizer to include a flag whether or not to always prepend a space before words (previous behavior for embeddings)
* Fix BPE Tokenizer to correctly convert and tokenize non-latin and other special characters/words
* Fix `RobertaForQuestionAnswering` to produce the same logits and indexes as the implementation in Transformer library
* Fix the return order of logits in `BertForQuestionAnswering` and `DistilBertForQuestionAnswering` annotators


========

5.1.3

========
----------------
New Features & Enhancements
----------------
* **NEW:** Introducing support for ONNX Runtime in BertForTokenClassification annotator
* **NEW:** Introducing support for ONNX Runtime in BertForSequenceClassification annotator
* **NEW:** Introducing support for ONNX Runtime in BertForQuestionAnswering annotator
* **NEW:** Introducing support for ONNX Runtime in DistilBertForTokenClassification annotator
* **NEW:** Introducing support for ONNX Runtime in DistilBertForSequenceClassification annotator
* **NEW:** Introducing support for ONNX Runtime in DistilBertForQuestionAnswering annotator
* **NEW:** Setting ONNX configuration such as GPU device id, execution mode, etc. via Spark NLP configs
* Update Whisper documentation with minimum required version of Spark/PySpark (3.4)

----------------
Bug Fixes
----------------
* Fix `module 'sparknlp.annotator' has no attribute 'Token2Chunk'` error in Python when using `Token2Chunk` annotator inside loaded PipelineModel



========

5.1.2

========
----------------
New Features & Enhancements
----------------
* **NEW:** Introducing VisionEncoderDecoder annotator to generate captions from images
* Add missing enteries in the docs and update them with the new features
* Improve beam search results in BART Transformer


========

5.1.1

========
----------------
New Features & Enhancements
----------------
* **NEW:** Introducing support for ONNX Runtime in MPNet embedding annotator
* **NEW:** Introducing support for ONNX Runtime in AlbertForTokenClassification annotator
* **NEW:** Introducing support for ONNX Runtime in AlbertForSequenceClassification annotator
* **NEW:** Introducing support for ONNX Runtime in AlbertForQuestionAnswering annotator
* Implement `getVectors` feature in Word2VecModel, Doc2VecModel, and WordEmbeddingsModel annotators. This new feature allows access to the entire tokens and their vectors in the loaded model.

----------------
Bug Fixes
----------------
* Fix how to save and load `Whisper` models
* Fix saving ONNX model on Windows operating system


========

5.1.0

========
----------------
New Features & Enhancements
----------------
* **NEW:** Introducing WhisperForCTC annotator for Automatic Speech Recognition (ASR)
* **NEW:** Introducing OpenAICompletion and OpenAIEmbeddings annotators
* **NEW:** Introducing MPNet Text Embeddings annotators
* **NEW:** Introducing a new BART for Zero-Shot Text Classification annotator
* **NEW:** Adding ONNX support to E5 Embeddings annotator
* **NEW:** New full support for GCP and Azure distributed storages
* New 150+ MPNet models
* New Databricks 13.3 runtime support
* New EMR 6.12.0 version support


----------------
Bug Fixes
----------------
* Fix max sentence length issue in E5Embeddings

========

5.0.2

========
----------------
New Features & Enhancements
----------------
* **NEW:** Introducing support for ONNX Runtime in ALBERT, CamemBERT, and XLM-RoBERTa annotators
* **NEW:** Implement ZeroShotNerModel annotator for zero-shot NER based on XLM-RoBERTa architecture

----------------
Bug Fixes
----------------
* Fix MarianTransformers annotator breaking with `java.lang.ClassCastException` in Python
* Fix out of 0.0/1.0 accuracy in SentenceDetectorDL and MultiClassifierDL annotators
* Fix BART issue with low temperature value that only occurred when there are no non infinite logits satisfying the low temperature and top_k values
* Add missing E5Embeddings and InstructorEmbeddings annotators to `annotators` in Scala for easy all-in-one import

========

Page 3 of 23

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.