Spark-nlp

Latest version: v5.5.3

Safety actively analyzes 723607 Python packages for vulnerabilities to keep your Python projects secure.

Page 18 of 23

2.0.7

Not secure

========
---------------
Overview
---------------
This release addresses bugs related to cluster support, improving error messages and fixing various potential bugs depending
on the cluster configuration, such as Kryo Serialization or non default FS systems

---------------
Bugfixes
---------------
* Fixed a bug introduced in 2.0.5 that caused NerDL not to work in clusters with Kryo serialization enabled
* NerDLModel was not properly reading user provided config proto bytes during prediction
* Improved cluster embeddings message to hit user of cluster mode without shared filesystems
* Removed lazy model downloading on PretrainedPipeline to download the model at instantiation
* Fixed URI construction for cluster embeddings on non defaultFS configurations, improves cluster compatibility

========

2.0.6

Not secure

========
---------------
Overview
---------------
Following the 2.0.5 (read notes below), this release fixes a bug when disabling contrib param in NerDLApproach on non-windows OS

---------------
Bugfixes
---------------
* Fixed NerDLApproach failing when training with setUseContrib(false)

========

2.0.5

Not secure

========
---------------
Overview
---------------
This release bumps Spark NLP by default to Apache Spark 2.4.3. Spark has been undergoing testing with Scala 2.12 and they are back in 2.11 now, so this should be a working release.
In this version, we fixed a series of Pretrained models, as well as focused on improving the flexibility of NerDL annotator, which is, if not, the most popular one based on user feedback.
Users can point to graphs they create without having to re-compile the library, graph options as well whether to use Tensorflow contrib is now user defined.
Particular thanks to CyborgDroid because of reporting importantly and well-reported bugs that helped us improve Spark NLP.
Thank you for reporting issues and feedback, and we always welcome more. Join us on Slack!

---------------
Enhancements
---------------
* ViveknSentiment annotator now includes confidence score in metadata
* NerDL now has setGraphFolder to allow a path to folder with custom generated graphs using python/tensorflow code
* NerDL now has setConfigProtoBytes to allow users submit his own ConfigProto (serialized) to the graph settings
* NerDLApproach now has setUseContrib to let training user decide whether or not to use contrib. Contrib LSTM Cells are proved to return more accurate results, but does not work in Windows yet.
* Updated default tensorflow settings to include GPU allow_growth by default, disabled log device placement spamming message
* Spark version bumped to 2.4.3

---------------
Bugfixes
---------------
* Fixed contrib NerDL models not work properly in clusters such as Databricks (Thanks CyborgDroid)
* Fixed sparknlp.start(include_ocr=True) missing dependencies for OCR
* Fixed DependencyParser pretrained models not working properly in Python

---------------
Models and Pipelines
---------------
* NerDL will download noncontrib model if windows is detected, for better compatibility
* noncontrib version of pipelines with NerDL have been uploaded, as well as new models. Check documentation for complete list
* Improved error message when user is under windows and trying to load a contrib NerDL model
* Fixed ViveknSentimentModel not working properly (Thanks CyborgDroid)

---------------
Developer API
---------------
* Embeddings in python moved to annotator module for consistency
* SourceStream ResourceHelper class now properly handles cluster files for Dependency Parser
* Metadata model reader now ignores empty lines instead of failing
* Unified lang instead of language attribute name in pretrained API

========

2.0.4

Not secure

========
---------------
Overview
---------------
We are excited about Spark NLP workshop (spark-nlp-workshop repository) being so useful for many users.
Now we also made a step forward by moving website's documentation to an easy to maintain Wiki!. Spark NLP library received key bug fixes
on this release. Thanks to the community for reporting issues on GitHub. Much more to come, as always.

---------------
Bugfixes
---------------
* Fixed DependencyParser and TypedDependencyParser working inaccurately
* Fixed a bug preventing the load of WordEmbeddingsModel class from python
* Fixed wrong pretrained model names preventing some pretrained models to work properly
* Fixed BertEmbeddings not being capable of loading from file due a reader exception

---------------
Documentation
---------------
* Website documentation migrated to GitHub wiki page (WIP)

---------------
Developer API
---------------
* OcrHelper now reports failed file name when throwing exceptions (Thanks kgeis)
* Fixed Annotation function explodeAnnotations to consider replacing output column scenarios
* Fixed TRAVIS CI unit tests

========

2.0.3

Not secure

========
---------------
Overview
---------------
Short after 2.0.2, a hotfix release was made to address two bugs that prevented users from using pretrained tensorflow models in clusters.
Please read release notes for 2.0.2 to catch up!

---------------
Bugfixes
---------------
* Fixed logger serializable, causing issues in executors to serialize TensorflowWrapper
* Fixed contrib loading in cluster, when retrieving a Tensorflow session

========

2.0.2

Not secure

========
---------------
Overview
---------------
Thank you for joining us in this exciting Spark NLP year!. We continue to make progress towards a better performing library, both in speed and in accuracy.
This release focuses strongly in the quality and stability of the library, making sure it works well in most cluster environments
and improving the compatibility across systems. Word Embeddings continue to be improved for better performance and lower memory blueprint.
Context Spell Checker continues to receive enhancements in concurrency and usage of spark. Finally, tensorflow based annotators
have been significantly improved by refactoring the serialization design. Help us with feedback and we'll welcome any issue reports!

---------------
New Features
---------------
* NerCrf annotator has now includeConfidence param that includes confidence scores for predictions in metadata

---------------
Enhancements
---------------
* Cluster mode performance improved in tensorflow annotators by serializing to bytes internal information
* Doc2Chunk annotator added new params startCol, startColByTokenIndex, failOnMissing and lowerCase allows better chunking of documents
* All annotations that derive from sentence or chunk types now contain metadata information referring to the sentence or chunk ID they belong to
* ContextSpellChecker now creates a window around the token to improve computation performance
* Improved WordEmbeddings matching accuracy by trying alternative case sensitive tokens
* WordEmbeddings won't load twice if already loaded
* WordEmbeddings can use embeddingsRef if source was not provided, improving reutilization of embeddings in a pipeline
* WordEmbeddings new param includeEmbeddings allow annotators not to save entire embeddings source along them
* Contrib tensorflow dependencies now only load if necessary

---------------
Bugfixes
---------------
* Added missing Symmetric delete pretrained model
* Fixed a broken param name in Normalizer (thanks RobertSassen)
* Fixed Cloudera cluster support
* Fixed concurrent access in ContextSpellChecker in high partition number use cases and LightPipelines
* Fixed POS dataset creator to better handle corrupted pairs
* Fixed a bug in Word Embeddings not matching exact case sensitive tokens in some scenarios
* Fixed OCR Tess4J initialization problems in concurrent scenarios

---------------
Models and Pipelines
---------------
* Renaming of models and pipelines (work in progress)
* Better output column naming in pipelines

---------------
Developer API
---------------
* Unified more WordEmbeddings interface with dimension params and individual setters
* Improved unit tests for better compatibility on Windows
* Python embeddings moved to sparknlp.embeddings

========

Page 18 of 23

Releases

Has known vulnerabilities

Previous Next

Spark-nlp

Page 18 of 23

2.0.7

2.0.6

2.0.5

2.0.4

2.0.3

2.0.2

Page 18 of 23

Links

Releases