Spark-nlp

Latest version: v5.5.3

Safety actively analyzes 723911 Python packages for vulnerabilities to keep your Python projects secure.

Page 4 of 23

5.1.0

========
----------------
New Features & Enhancements
----------------
* **NEW:** Introducing WhisperForCTC annotator for Automatic Speech Recognition (ASR)
* **NEW:** Introducing OpenAICompletion and OpenAIEmbeddings annotators
* **NEW:** Introducing MPNet Text Embeddings annotators
* **NEW:** Introducing a new BART for Zero-Shot Text Classification annotator
* **NEW:** Adding ONNX support to E5 Embeddings annotator
* **NEW:** New full support for GCP and Azure distributed storages
* New 150+ MPNet models
* New Databricks 13.3 runtime support
* New EMR 6.12.0 version support

----------------
Bug Fixes
----------------
* Fix max sentence length issue in E5Embeddings

========

5.0.2

========
----------------
New Features & Enhancements
----------------
* **NEW:** Introducing support for ONNX Runtime in ALBERT, CamemBERT, and XLM-RoBERTa annotators
* **NEW:** Implement ZeroShotNerModel annotator for zero-shot NER based on XLM-RoBERTa architecture

----------------
Bug Fixes
----------------
* Fix MarianTransformers annotator breaking with `java.lang.ClassCastException` in Python
* Fix out of 0.0/1.0 accuracy in SentenceDetectorDL and MultiClassifierDL annotators
* Fix BART issue with low temperature value that only occurred when there are no non infinite logits satisfying the low temperature and top_k values
* Add missing E5Embeddings and InstructorEmbeddings annotators to `annotators` in Scala for easy all-in-one import

========

5.0.1

========
----------------
Bug Fixes & Enhancements
----------------
* Fix `multiLabel` param issue in `XXXForSequenceClassitication` and `XXXForZeroShotClassification` annotators
* Add the missing `threshold` param to all `XXXForSequenceClassitication` in Python
* Fix issue with passing `spark.driver.cores` config as a param into start() function in Python and Scala
* Add new notebooks to export BERT, DistilBERT, RoBERTa, and DeBERTa models to ONNX format

========

5.0.0

========
----------------
New Features & Enhancements
----------------
* **NEW:** Introducing support for ONNX Runtime in Spark NLP. ONNX Runtime is a high-performance inference engine for machine learning models in the ONNX format. ONNX Runtime has proved to considerably increase the performance of inference for many models.
* **NEW:** Introducing **InstructorEmbeddings** annotator in Spark NLP 🚀. `InstructorEmbeddings` can load new state-of-the-art INSTRUCTOR Models inherited from T5 for Text Embeddings.
* **NEW:** Introducing **E5Embeddings** annotator in Spark NLP 🚀. `E5Embeddings` can load new state-of-the-art E5 Models inherited from BERT for Text Embeddings.
* **NEW:** Introducing **DocumentSimilarityRanker** annotator in Spark NLP 🚀. `DocumentSimilarityRanker` is a new annotator that uses LSH techniques present in Spark ML lib to execute approximate nearest neighbours search on top of sentence embeddings, It aims to capture the semantic meaning of a document in a dense, continuous vector space and return it to the ranker search.

----------------
Bug Fixes
----------------
* Fix BART issue with maxInputLength

========

4.4.4

========
----------------
New Features & Enhancements
----------------
* Add `Warmup` stage to loading all Transformers for word embeddings: ALBERT, BERT, CamemBERT, DistilBERT, RoBERTa, XLM-RoBERTa, and XLNet. This helps reducing the first inference time and also validate importing external models from HuggingFace https://github.com/JohnSnowLabs/spark-nlp/pull/13851
* Add new notebooks to import ZeroShot Classifiers for Bert, DistilBERT, and RoBERTa fine-tuned based on NLI datasets https://github.com/JohnSnowLabs/spark-nlp/pull/13845

----------------
Bug Fixes
----------------
* Fix not being able to save models from XXXForSequenceClassification and XXXForZeroShotClassification annotators https://github.com/JohnSnowLabs/spark-nlp/pull/13842

========

4.4.3

========
----------------
New Features & Enhancements
----------------
* New `multilabel` parameter to switch from multi-class to multi-label on all Classifiers in Spark NLP: AlbertForSequenceClassification, BertForSequenceClassification, DeBertaForSequenceClassification, DistilBertForSequenceClassification, LongformerForSequenceClassification, RoBertaForSequenceClassification, XlmRoBertaForSequenceClassification, XlnetForSequenceClassification, BertForZeroShotClassification, DistilBertForZeroShotClassification, and RobertaForZeroShotClassification
* Refactor protected Params and Features to avoid unwanted exceptions during runtime https://github.com/JohnSnowLabs/spark-nlp/pull/13797
* Add proper documentation and instructions for ZeroShot classifiers: BertForZeroShotClassification, DistilBertForZeroShotClassification, and RobertaForZeroShotClassification https://github.com/JohnSnowLabs/spark-nlp/pull/13798
* Extend support for downloading models/pipelines directly by given name or S3 path in ResourceDownloader https://github.com/JohnSnowLabs/spark-nlp/pull/13796

----------------
Bug Fixes
----------------
* Fix pretrained pipelines that stopped working since 4.4.2 release on PySpark 3.0 and 3.1 versions (adding 123 new pipelines were added) https://github.com/JohnSnowLabs/spark-nlp/pull/13805
* Fix pretrained pipelines that stopped working since 4.4.2 release on PySpark 3.2 and 3.3 versions (adding 120 new pipelines) https://github.com/JohnSnowLabs/spark-nlp/pull/13811
* Fix Java compatibility issue caused by SystemUtils dependency https://github.com/JohnSnowLabs/spark-nlp/pull/13806

========

Page 4 of 23

Releases

Has known vulnerabilities

Previous Next

Spark-nlp

Page 4 of 23

5.1.0

5.0.2

5.0.1

5.0.0

4.4.4

4.4.3

Page 4 of 23

Links

Releases