Spark-nlp

Latest version: v5.5.1

Safety actively analyzes 685670 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 13 of 23

2.6.2

Not secure
========
---------------
New Features
---------------
* Introducing a new SentenceDetectorDL

----------------
Enhancements
----------------
* Improved BioBERT models quality for BertEmbeddings (it achieves higher accuracy in sequence classification)
* Improved Sentence BioBERT models quality for BertSentenceEmbeddings (it achieves higher accuracy in text classification)
* Add unit test to MultiClassifierDL annotator
* Better error handling in SentimentDLApproach
* Improve loadSavedModel in BertEmbeddings and BertSentenceEmbeddings

----------------
Bugfixes
----------------
* Fix BERT LaBSE model for BertSentenceEmbeddings
* Fix loadSavedModel for BertSentenceEmbeddings in Python

---------------
Deprecations
---------------
* DeepSentenceDetector is deprecated in favor of SentenceDetectorDL

========

2.6.1

Not secure
========
----------------
Bugfixes
----------------
* Fix a bug in ClassifierDL that resulted in low accuracy during the training

========

2.6.0

Not secure
========
------------------------------
Major features and improvements
------------------------------

* **NEW:** A new MultiClassifierDL annotator for multi-label text classification
* **NEW:** A new BertSentenceEmbeddings annotator with 41 available pre-trained models for sentence embeddings used in SentimentDL, ClassifierDL, and MultiClassifierDL annotators
* **NEW:** A new YakeModel annotator for an unsupervised, corpus-independent, domain, and language-independent and single-document keyword extraction algorithm
* Integrate 24 new Small BERT models where the smallest model is 24x times smaller and 28x times faster compare to BERT base models
* Add 3 new ELECTRA small, base, and large models
* Add 4 new Finnish BERT models for BertEmbeddings and BertSentenceEmbeddings
* Improve BertEmbeddings memory consumption by 30%
* Improve BertEmbeddings performance by more than 70% with a new built-in dynamic shape inputs
* Remove the poolingLayer parameter in BertEmbeddings in favor of sequence_output that is provided by TF Hub models for new BERT models
* Add validation loss, validation accuracy, validation F1, and validation True Positive Rate during the training in MultiClassifierDL
* Add parameter to enable/disable list detection in SentenceDetector
* Unify the loggings in ClassifierDL and SentimentDL during training

----------------
Bugfixes
----------------
* Fix Tokenization bug with Bigrams in the exception list
* Fix the versioning error in second SBT projects causing models not being found via pretrained function
* Fix logging to file in NerDLApproach, ClassifierDL, SentimentDL, and MultiClassifierDL on HDFS
* Fix ignored modified tokens in BertEmbeddings, now it will consider modified tokens instead of originals

========

2.5.5

Not secure
========
---------------
New Features
---------------
- Add getClasses() function to NerDLModel
- Add getClasses() function to ClassifierDLModel
- Add getClasses() function to SentimentDLModel

---------------------
Enhancements
---------------------
- Improve max sequence length calculation in BertEmbeddings and XlnetEmbeddings

----------------
Bugfixes
----------------
- Fix a bug in RegexTokenizer in Python
- Fix StopWordsCleaner exception in Python when pretrained() is used
- Fix max sequence length issue in AlbertEmbeddings and SentencePiece generation
- Fix HDFS support for setGaphFolder param in NerDLApproach

========

2.5.4

Not secure
========
---------------
New Features
---------------
* Add support for Apache Spark 2.3.x including new Maven artifacts and full support of all pre-trained models/pipelines
* Add 43 new pre-trained models in 43 languages to StopWordsCleaner annotator
* Introduce a new RegexTokenizer to split text by regex pattern

---------------------
Enhancements
---------------------
* Retrained 6 new BioBERT and ClinicalBERT models
* Add a new param to `start()` function to start the session for Apache Spark 2.3.x

----------------
Bugfixes
----------------
* Add missing library for SentencePiece used by AlbertEmbeddings and XlnetEmbeddings on Windows
* Fix ModuleNotFoundError in LanguageDetectorDL pipelines in Python


========

2.5.3

Not secure
========
---------------
New Features
---------------
* TextMatcher now can construct the chunks from tokens instead of the original documents via buildFromTokens param
* CoNLLGenerator now is accessible in Python


----------------
Bugfixes
----------------
* Fix a bug in ContextSpellChecker resulting in IllegalArgumentException

---------------------
Enhancements
---------------------
* Improve RocksDB connection to support different storage capabilities
* Improve parameters naming convention in ContextSpellChecker

---------------------
Enhancements
---------------------
* Add NerConverter to documentation
* Fix multi-language tabs in documentation


========

Page 13 of 23

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.