Spark-nlp

Latest version: v5.5.3

Safety actively analyzes 723929 Python packages for vulnerabilities to keep your Python projects secure.

Page 14 of 23

2.5.4

Not secure

========
---------------
New Features
---------------
* Add support for Apache Spark 2.3.x including new Maven artifacts and full support of all pre-trained models/pipelines
* Add 43 new pre-trained models in 43 languages to StopWordsCleaner annotator
* Introduce a new RegexTokenizer to split text by regex pattern

---------------------
Enhancements
---------------------
* Retrained 6 new BioBERT and ClinicalBERT models
* Add a new param to `start()` function to start the session for Apache Spark 2.3.x

----------------
Bugfixes
----------------
* Add missing library for SentencePiece used by AlbertEmbeddings and XlnetEmbeddings on Windows
* Fix ModuleNotFoundError in LanguageDetectorDL pipelines in Python

========

2.5.3

Not secure

========
---------------
New Features
---------------
* TextMatcher now can construct the chunks from tokens instead of the original documents via buildFromTokens param
* CoNLLGenerator now is accessible in Python

----------------
Bugfixes
----------------
* Fix a bug in ContextSpellChecker resulting in IllegalArgumentException

---------------------
Enhancements
---------------------
* Improve RocksDB connection to support different storage capabilities
* Improve parameters naming convention in ContextSpellChecker

---------------------
Enhancements
---------------------
* Add NerConverter to documentation
* Fix multi-language tabs in documentation

========

2.5.2

Not secure

========
---------------
New Features
---------------
* Introducing a new LanguageDetectorDL state-of-the-art annotator to detect and identify languages in documents and sentences
* Add a new param entityValue to TextMatcher to add custom value inside metadata. Useful in post-processing when there are multiple TextMatcher annotators with multiple dictionaries https://github.com/JohnSnowLabs/spark-nlp/issues/920

----------------
Bugfixes
----------------
* Add missing TensorFlow graphs to train ContextSpellChecker annotator https://github.com/JohnSnowLabs/spark-nlp/issues/912
* Fix misspelled param in classThreshold param in ContextSpellChecker annotator https://github.com/JohnSnowLabs/spark-nlp/issues/911
* Fix a bug where setGraphFolder in NerDLApproach annotator couldn't find a graph on Databricks (DBFS) https://github.com/JohnSnowLabs/spark-nlp/issues/739
* Fix a bug in NerDLApproach when includeConfidence was set to true https://github.com/JohnSnowLabs/spark-nlp/issues/917
* Fix a bug in BertEmbeddings https://github.com/JohnSnowLabs/spark-nlp/issues/906 https://github.com/JohnSnowLabs/spark-nlp/issues/918

---------------------
Enhancements
---------------------
* Improve TF backend in ContextSpellChecker annotator

========

2.5.1

Not secure

========
---------------
New Features
---------------
* Add Python support for PubTator reader to convert automatic annotations of the biomedical datasets into DataFrame
* Add 6 new pre-trained BERT models from BioBERT and ClinicalBERT

---------------------
Enhancements
---------------------
* Add unit tests for XlnetEmbeddings
* Add unit tests for AlbertEmbeddings
* Add unit tests for ContextSpellChecker

========

2.5.0

Not secure

========
---------------
New Features
---------------
* A new AlbertEmbeddings annotator with 4 available pre-trained models
* A new XlnetEmbeddings annotator with 2 available pre-trained models
* A new ContextSpellChecker annotator, the state-of-the-art annotator for spell checking
* A new SentimentDL annotator for multi-class sentiment analysis. This annotator comes with 2 available pre-trained models trained on IMDB and Twitter datasets
* Add new PubTator reader to convert automatic annotations of the biomedical datasets into DataFrame
* Introducing a new outputLogsPath param for NerDLApproach, ClassifierDLApproach and SentimentDLApproach annotators
* Refactored CoNLLGenerator to actually use NER labels from the DataFrame
* Unified params in NerDLModel in both Scala and Python
* Extend and complete Scaladoc APIs for all the annotators

----------------
Bugfixes
----------------
* Fix position of tokens in Normalizer
* Fix Lemmatizer exception on a bad input
* Fix annotator logs failing on object storage file systems like DBFS

----------------
Documentation
----------------
* Update documentation for release of Spark NLP 2.5.x
* Update the entire [spark-nlp-workshop](https://github.com/JohnSnowLabs/spark-nlp-models) notebooks for Spark NLP 2.5.x
* Update the entire [spark-nlp-models](https://github.com/JohnSnowLabs/spark-nlp-workshop) repository with new pre-trained models and pipelines

========

2.4.5

Not secure

========
---------------
Overview
---------------
We are very excited to extend Spark NLP support to 6 new Databricks runtimes and add support to Cloudera and EMR YARN cluster-mode.
As always, we thank our community for their feedback and questions in our Slack channel.

---------------
New Features
---------------
* Extend Spark NLP support for Databricks runtimes:
* 6.2
* 6.2 ML
* 6.3
* 6.3 ML
* 6.4
* 6.4 ML
* 6.5
* 6.5 ML
* Add support for cluster-mode in Cloudera and EMR YARN clusters
* New splitPattern param in Tokenizer to split tokens by regex rules

----------------
Bugfixes
----------------
* Fix ClassifierDLModel save and load in Python
* Fix ClassifierDL TensorFlow session reuse
* Fix Normalizer positions of new tokens

----------------
Documentation
----------------
* Update documentation for release of Spark NLP 2.4.x
* Update the entire [spark-nlp-workshop](https://github.com/JohnSnowLabs/spark-nlp-models) notebooks for Spark NLP 2.4.x
* Update the entire [spark-nlp-models](https://github.com/JohnSnowLabs/spark-nlp-workshop) repository with new pre-trained models and pipelines

========

Page 14 of 23

Releases

Has known vulnerabilities

Previous Next

Spark-nlp

Page 14 of 23

2.5.4

2.5.3

2.5.2

2.5.1

2.5.0

2.4.5

Page 14 of 23

Links

Releases