Spark-nlp

Latest version: v5.5.1

Safety actively analyzes 685670 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 9 of 23

3.3.2

Not secure
========
----------------
New Features
----------------
* Comet.ml integration with Spark NLP
* Introducing BertForSequenceClassification annotator

----------------
Bug Fixes
----------------
* Fix EntityRulerApproach name from import
* Fix missing EntityRulerModel in ResourceDownloader
* Fix NerDLApproach logs format on Databricks
* Fix a missing batchSize param in NerDLModel that degraded GPU performance


========

3.3.1

Not secure
========
----------------
New Features
----------------
* Introducing EntityRuler annotator to receive either a JSON or CSV ontology file that maps entities to patterns. You can implement a purely rule-based entity recognition system by using EntityRuler, it can be saved as a Model and reused in other pipelines to annotate your document against your knowledge base.

----------------
Bug Fixes
----------------
* Fix compatibility issue between NerOverwriter and AlbertForTokenClassification, BertForTokenClassification, DistilBertForTokenClassification, LongformerForTokenClassification, RoBertaForTokenClassification, XlmRoBertaForTokenClassification, XlnetForTokenClassification annotators
* Fix a bug in ContextSpellCheckerApproach annotator failing to find an appropriate TF graph
* Fix a bug in ContextSpellCheckerModel not being able to load a trained model
* Fix token alignment with token pieces in BertEmbeddings resulting in missing vectors with Unicode characters
* Add the missing pretrained NER models for the XlmRoBertaForTokenClassification annotator
* Add the missing pretrained NER models for the LongformerForTokenClassification annotator

----------------
Backward compatibility
----------------
* Renaming YakeModel to YakeKeywordExtraction to represent the actual purpose of this annotator more clearly.


========

3.3.0

Not secure
========

----------------
Major features and improvements
----------------
* **NEW:** Beginning of Spark NLP 3.3.0 release there will be no limitation of size when you import TensorFlow models! You can now import TF Hub & HuggingFace models larger than 2G of size.
* **NEW:** Up to 50x faster when saving Spark NLP models and pipelines! 🚀 We have improved the way we package TensorFlow SavedModel while saving Spark NLP models & pipelines. For instace, it used to take up to 10 minutes to save `xlm_roberta_base` model prior to Spark NLP 3.3.0, and now it only takes up to 15 seconds!
* **NEW:** Introducing **AlbertForTokenClassification** annotator in Spark NLP 🚀. `AlbertForTokenClassification` can load ALBERT Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks. This annotator is compatible with all the models trained/fine-tuned by using `AlbertForTokenClassification` or `TFAlbertForTokenClassification` in HuggingFace 🤗
* **NEW:** Introducing **XlnetForTokenClassification** annotator in Spark NLP 🚀. `XlnetForTokenClassification` can load XLNet Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks. This annotator is compatible with all the models trained/fine-tuned by using `XLNetForTokenClassificationet` or `TFXLNetForTokenClassificationet` in HuggingFace 🤗
* **NEW:** Introducing **RoBertaForTokenClassification** annotator in Spark NLP 🚀. `RoBertaForTokenClassification` can load RoBERTa Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks. This annotator is compatible with all the models trained/fine-tuned by using `RobertaForTokenClassification` or `TFRobertaForTokenClassification` in HuggingFace 🤗
* **NEW:** Introducing **XlmRoBertaForTokenClassification** annotator in Spark NLP 🚀. `XlmRoBertaForTokenClassification` can load XLM-RoBERTa Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks. This annotator is compatible with all the models trained/fine-tuned by using `XLMRobertaForTokenClassification` or `TFXLMRobertaForTokenClassification` in HuggingFace 🤗
* **NEW:** Introducing **LongformerForTokenClassification** annotator in Spark NLP 🚀. `LongformerForTokenClassification` can load Longformer Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks. This annotator is compatible with all the models trained/fine-tuned by using `LongformerForTokenClassification` or `TFLongformerForTokenClassification` in HuggingFace 🤗
* **NEW:** Introducing new ResourceDownloader functions to easily look for pretrained models & pipelines inside Spark NLP (Python and Scala). You can filter models or pipelines via `language`, `version`, or the name of the `annotator`
* Welcoming [Databricks Runtime 9.1 LTS](https://docs.databricks.com/release-notes/runtime/9.1.html), 9.1 ML, and 9.1 ML with GPU
* Fix printing a wrong version return in sparknlp.version()

----------------
Bug Fixes
----------------
* Fix a bug in RoBertaEmbeddings when all special tokens were identical
* Fix a bug in RoBertaEmbeddings when special token contained valid regex
* Fix a bug lead to memory leak inside NorvigSweeting spell checker. This issue caused issues with pretrained pipelines such as `explain_document_ml` and `explain_document_dl` when some inputs
* Fix the wrong types being assigned to `minCount` and `classCount` in Python for `ContextSpellCheckerApproach` annotator
* Fix `explain_document_ml` pretrained pipeline for Spark NLP 3.x on Apache Spark 2.x


========

3.2.3

Not secure
========

----------------
Bug Fixes & Enhancements
----------------
* Add delimiter feature to CoNLL() class to support other delimiters in CoNLL files https://github.com/JohnSnowLabs/spark-nlp/pull/5934
* Add support for IOB in addition to IOB2 format in GraphExctraction https://github.com/JohnSnowLabs/spark-nlp/pull/6101
* Change YakeModel output type from KEYWORD to CHUNK to have more available features after the YakeModel annotator such as Chunk2Doc or ChunkEmbeddings https://github.com/JohnSnowLabs/spark-nlp/pull/6065
* Fix the default language for XlmRoBertaSentenceEmbeddings pretrained model in Python https://github.com/JohnSnowLabs/spark-nlp/pull/6057
* Fix SentenceEmbeddings issue concatenating sentences instead of each correspondent sentence https://github.com/JohnSnowLabs/spark-nlp/pull/6060
* Fix GraphExctraction usage in LightPipeline https://github.com/JohnSnowLabs/spark-nlp/pull/6101
* Fix compatibility issue in `explain_document_ml` pipeline
* Better import process for corrupted merges file in Longformer tokenizer https://github.com/JohnSnowLabs/spark-nlp/pull/6083


========

3.2.2

Not secure
========

----------------
New Features
----------------
* A new RoBertaSentenceEmbeddings annotator for sentence embeddings used in SentimentDL, ClassifierDL, and MultiClassifierDL annotators
* A new XlmRoBertaSentenceEmbeddings annotator for sentence embeddings used in SentimentDL, ClassifierDL, and MultiClassifierDL annotators
* Add support for AWS MFA via Spark NLP configuration
* Add new AWS configs to Spark NLP configuration when using a private S3 bucket to store logs for training models or access TF graphs needed in NerDLApproach
* spark.jsl.settings.aws.credentials.access_key_id
* spark.jsl.settings.aws.credentials.secret_access_key
* spark.jsl.settings.aws.credentials.session_token
* spark.jsl.settings.aws.s3_bucket
* spark.jsl.settings.aws.region

----------------
Bug Fixes & Enhancements
----------------
* Improve loading merges file for RoBERTa tokenizer
* Remove batchSize param from broadcast in XlmRoBertaEmbeddings to be set after it is created
* Preserve previsouly generated metadata in BertSentenceEmbeddings annotator
* Set `elmo` as a default poolingLayer in ElmoEmbeddings
* Fix special tokens ids in XlmRoBertaEmbeddings annotator
* Fix distilbert_base_token_classifier_ontonotes model
* Fix distilbert_base_token_classifier_conll03 model
* Fix distilbert_base_token_classifier_few_nerd model
* Fix distilbert_token_classifier_persian_ner model
* Fix ner_conll_longformer_base_4096 model


========

3.2.1

Not secure
========
----------------
Patch release
----------------
* Fix "unsupported model" error in pretrained function for LongformerEmbeddings, BertForTokenClassification, and DistilBertForTokenClassification


========

Page 9 of 23

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.