Spark-nlp

Latest version: v5.5.3

Safety actively analyzes 723158 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 23

5.5.2

========
----------------
New Features & Enhancements
----------------
* OpenVINO Support for Transformers (PR 14408):
Added OpenVINO inference support to a broad range of transformer-based annotators, including DeBertaForQuestionAnswering, DeBertaForSequenceClassification, RoBertaForTokenClassification, XlmRobertaForZeroShotClassification, BartTransformer, GPT2Transformer, and many others.
* BLIPForQuestionAnswering Transformer (PR 14422):
Introduced a new transformer BLIPForQuestionAnswering for image-based question answering tasks. The transformer processes images alongside associated questions to provide relevant answers.
* AutoGGUFEmbeddings Annotator (PR 14433):
Added AutoGGUFEmbeddings to support embeddings from AutoGGUFModels, providing rich sentence embeddings. Includes an end-to-end example notebook for usage.
* HTML Parsing into DataFrame (PR 14449):
Introduced sparknlp.read().html() to parse local or remote HTML files and convert them into structured Spark DataFrames for easier analysis.
* Email Parsing into DataFrame (PR 14455):
Added sparknlp.read().email() method to parse email files into structured DataFrames, enabling scalable analysis of email content. (Note: Dependent on 14449)
* Microsoft Word Document Parsing into DataFrame (PR 14476):
Added a new feature to parse .docx and .doc files into a Spark DataFrame, streamlining the integration of Word documents into NLP pipelines.
* Microsoft Fabric Support (PR 14467):
Introduced support for leveraging Microsoft Fabric for word embeddings storage and retrieval, enhancing scalability and efficiency.
* cuDNN Upgrade Instructions on Databricks (PR 14451):
Added instructions on upgrading cuDNN for GPU inference and cleaned up redundant Databricks installation instructions.
* ChunkEmbeddings Metadata Preservation (PR 14462):
Modified ChunkEmbeddings to preserve the original chunk’s metadata in the resulting embeddings, ensuring richer contextual information is retained.
* Default Names and Languages for Annotators (PR 14469):
Updated default names and language configurations for newly created seq2seq annotators to improve consistency and clarity.

----------------
Bug Fixes
----------------
* Spark Version Errors (PR 14467):
Resolved issues related to long Spark versions when integrating Microsoft Fabric support.

========

5.5.1

========
----------------
New Features & Enhancements
----------------
* `BertForMultipleChoice` Transformer Added. Enhanced BERT’s capabilities to handle multiple-choice tasks such as standardized test questions and survey or quiz automation.
* Integrated New Tasks and Documentation:
* Added support and documentation for the following tasks:
* Automatic Speech Recognition
* Dependency Parsing
* Image Captioning
* Image Classification
* Landing Page
* Question Answering
* Summarization
* Table Question Answering
* Text Classification
* Text Generation
* Text Preprocessing
* Token Classification
* Translation
* Zero-Shot Classification
* Zero-Shot Image Classification
* `PromptAssembler` Annotator Introduced. Introduced a new annotator that constructs prompts for LLMs using a chat template and a sequence of messages. Accepts an array of tuples with roles (“system”, “user”, “assistant”) and message texts. Utilizes llama.cpp as a backend for template parsing, supporting basic template applications.

----------------
Bug Fixes
----------------
* Resolved Pretrained Model Loading Issue on DBFS Systems.
* Fixed a bug where pretrained models were not found when running AutoGGUF model pipelines on Databricks due to incorrect path handling of gguf files.

========

5.5.0

========
----------------
New Features & Enhancements
----------------
* Introduced QWEN2Transformer (14188)
* Introduced MiniCPM (14205)
* Introduced NLLB (14209)
* Implemented Nomic embeddings (14217)
* Introduced CamemBertForZeroShotClassification annotator (14354)
* Implemented Mxbai Embeddings (14355)
* Introduced AlbertForZeroShotClassification (14361)
* Introduced Phi-3 (14373)
* Implemented Starcoder2 for causal language modeling (14358)
* Integrated llama.cpp (14364)
* Implemented SnowFlake (14353)
* Introduced ONNX support to vision annotators (14356)
* Introduced ONNX and OpenVINO support to Missing Annotators (14359)
* Added OpenVINO install instructions (14382)
* Exported notebooks for release candidate (14393)

========

5.4.2

========
----------------
New Features & Enhancements
----------------
* Added demo notebook for Image Classification Annotators
* Added aggressiveMatching parameter to DateMatcher and MultiDateMatcher annotators
* Added aggressiveMatching parameter to DocumentSimilarityRanker annotator

========

5.4.1

========
----------------
New Features & Enhancements
----------------
* Added support for loading duplicate models in Spark NLP, allowing multiple models from the same annotator to be loaded simultaneously.
* Updated the README for better coherence and added new pages to the website.
* Added support for a stop IDs list to halt text generation in Phi, Mistral, and Llama annotators.

----------------
Bug Fixes
----------------
* Fixed the default model names for Phi2 and Mistral AI annotators.

========

5.4.0

========
----------------
New Features & Enhancements
----------------
* Added OpenVINO Runtime integration for various models, enabling enhanced inference performance. (14246)
* Added Python APIs to incorporate OpenVINO support. (14242)
* Introduced support for ONNX models and average pooling in ONNX-based annotators. (14245)
* Implemented MPNet for token classification. (14244)
* Added support for MistralAI LLM and LLAMA2. (14243)
* Improved caching mechanisms in Streamlit demos. (14241)
* Enhanced models' card and README documentation for Models Hub. (14240)
* Added OpenVINO GPU dependencies. (14236)
* Locked macOS version for runners and added missing SBT setup. (14235)

----------------
Bug Fixes
----------------
* Fixed bugs in Colab notebooks. (14239)
* Resolved issues with BERT backend and broken annotators. (14238)
* Corrected LLAMA2 position ID and generation bug. (14237)

========

Page 1 of 23

Releases

Has known vulnerabilities

Spark-nlp

Page 1 of 23

5.5.2

5.5.1

5.5.0

5.4.2

5.4.1

5.4.0

Page 1 of 23

Links

Releases