Spark-nlp

Latest version: v5.5.1

Safety actively analyzes 687881 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 21 of 23

1.5.4

========
---------------
Overview
---------------
This release improves various annotators: the Normalizer, SymmetricDelete, TextMatcher, DocumentAssembler and Finisher
allowing them to cover more use-cases that were mentioned in our Slack channel. We also fixed two important bugs.
Finally, this will be our first release with PIP support for python sparknlp, for those entirely python based.

---------------
Enhancements
---------------
* Normalizer now allows multiple to-delete regex patterns.
* Normalizer slangDictionary param allows converting tokens into something else (e.g. 'lol' into 'laughing out loud') from a dictionary file
* SymmetricDelete spell checker may now be trained from the dataset passed to fit if external corpus not provided
* SymmetricDelete spell checker improved training and prediction performance
* Finisher param includeMetadata now outputs annotation metadata content both in Array format or String format
* DocumentAssembler may now read from Array[String] column if provided. This improves compatibility for some SparkML transformers
* TextMatcher now includes identifier name in metadata

---------------
Bug fixes
---------------
* Fixed a bug introduced in 1.5.3 that made spark-nlp not to work in Python2 (thanks surendralalwani)
* Fixed SymmetricDeleteApproach wrong annotator type

---------------
Other
---------------
* setup.py for PIP support (instructions will be added to readme and website). Still needs spark-nlp jar in SparkSession classpath.

========

1.5.3

========
---------------
Overview
---------------
This quick release is a hotfix for issues found on 1.5.2 after it's release. Thanks to the users who quickly tested this out.
It fixes Symmetric spell checker not being capable of reading the pretrained model, a SentenceDetector missing default value and retroactive version matching to the downloader

---------------
Bug fixes
---------------
* Fixed a bug causing the library to fail when trying to save or read an annotator with an unset Feature without default
* Added missing default Param value to SentenceDetector. Thanks superman24-7
* Symmetric spell checker now utilizes List instead of ListBuffer on its prediction layer
* Fixed Vivekn Sentiment Analysis failing when training with a sentiment column

---------------
Models
---------------
* Symmetric Spell Checker pretrained model now works well and may be downloaded
* Vivekn Sentiment pretrained model now defaults to "token" input column instead of "spell"

---------------
Other
---------------
* Downloader now works retroactively when a newer version finds a model of a previous release
* Renamed folder argument to remote_loc for downloader remote location, which caused confusion. Thanks AtulSehgal
* Added new Scala example in example folder, also available on website

========

1.5.2

========
---------------
Overview
---------------
This release focuses on improving model downloader stability, fixing word embedding reading issues and joining
spark ecosystem filesystem configuration appropriately, utilizing spark's defined default filesystem, in order to work
properly with clusters and multi node environments. This includes Databricks cloud clusters or Amazon EMR yarn HDFS nodes.

Aside of that we come up with exciting new features, a brand new Spell Checker with higher accuracy inspired on the
Symmetric delete algorithm.

Finally Assertion Status can be trained and predicted on top of NER output, since before
this only worked by providing assertion status Start and End boundaries for the target to assert.

---------------
New Features
---------------
* Assertion status annotators can now be trained and predict against NER output instead of start and end boundaries. Entities can now be directly asserted
* Brand new Symmetric Delete annotator (SymmetricDeleteApproach) with closer to start of the art optimal accuracy 80%

---------------
Enhancements
---------------
* Model downloader now uses proper spark filesystem. Works properly with distributed storage, databricks cloud clusters or amazon EMR seamlessly
* Fixed several race condition while loading word embeddings from disk or download resources, library is more stable
* Improved several assertion status validations and error messages

---------------
Bug fixes
---------------
* Stand alone Annotator models are now properly read from disk in python

---------------
Models
---------------
* New Symmetric Delete Spell checker pretrained model
* Vivekn Sentiment annotator may now be downloaded standalone with pretrained()

========

1.5.1

========
---------------
Overview
---------------
This release is an enhancement release to 1.5.0 which includes improved downloader properties and better annotator defaults.
Also, assertion status models have been included as pretrained, which are models trained on top of Glove Stanford word embeddings

---------------
Enhancements
---------------
* SentenceDetector has now a useCustomOnly param which enforces into using only the custom bounds provided (thanks atomobianco)
* Normalizer defaults to not lowerCase words leads to better implicit accuracy in pipelines (thanks marek.modry)
* SpellChecker defaults to be case sensitive leads to better accuracy
* DateMatcher improved speed performance
* com.johnsnowlabs.annotator._ in Scala now also includes RecursivePipelines and LightPipelines for easier imports
* ModelDownloader has been improved with better directory management

---------------
Models
---------------
* New Assertion Status (LogisticRegression and DeepLearning) pretrained models now available
* Vivekn, Basic and Advanced pretrained Pipelines improved accuracy (thanks marek.modry)

---------------
Other
---------------
* S3 library dependencies updated

========

1.5.0

========
---------------
Overview
---------------
We are proud to announce if not the biggest release in terms of content in Spark-NLP!
This release makes the library miles easier to use for new comers, allowing easier to import
annotators and the extended use of model downloader throughout pretrained models and pipelines.
This also includes two new annotators that use deep learning algorithms with graphs from TensorFlow, which
is the first time we do so.
Apart from this, we include new Light Pipelines that are 10x times faster when working with data smaller than about
50,000 rows length.
Finally, we included several bugfixes across the library, from algorithm wise to developer API.
We'll gladly welcome any feedback! The website has been extensively updated.

---------------
New features
---------------
* Light Pipelines are Annotator Pipelines created from SparkML pipelines that run more than 10x faster in small datasets
* Deep Learning NER based on Bi-LSTM and Convolutional Neural Networks from word embeddings datasets
* Deep Learning Assertion Status model based on LSTM to compute status identification from word embeddings
* Easier to use Spark-NLP:
1. Imports have been made easy in scala API (com.johnsnowlabs.annotator._) to bring all annotators
2. BasicPipeline and AdvancedPipeline downloadable pipelines created for quick annotation of text
3. Light Pipelines are easy to use and accept simple strings to annotate a Spark ML Pipeline without spark datasets
* New Downloadable models: CRF NER, Lemmatizer, POS and Spell checker
* New Downloadable pipelines: Vivekn Sentiment analysis, BasicPipeline and AdvancedPipeline

---------------
Enhancements
---------------
* Model downloader significantly improved in terms of usability

---------------
Documentation
---------------
* Website widely improved
* Added invite to our first slack chat channel

---------------
Bugfixes
---------------
* Fixed positional index wrong value when creating Annotations from constructor
* Fixed hamming distance calculation in spell checker
* Fixed Downloadable NER model failing sporadically due to missing temporary files
* Fixed SearchTrie algorithm used in TextMatcher (fmy. EntiyExtractor) thanks avenka11 for reporting and proposing solution
* Fixed some model deserialization issues happening on Windows
---------------
Other
---------------
* Thanks to showy we have TravisCI automatic integration testing
* Finisher now outputs to array by default
* Training example resources removed in advantage of using the model downloader more

========

1.4.2

========
---------------
Bugfixes
---------------
* Filesystem protocols now properly read across the library, fixed use case for S3:// protocol (thanks avenka11)
* Library now works properly in Windows environments
* PySpark annotator param getters now work properly when retrieving default values
* Fixed stemmer serialization due to misspelled param name
* Fixed Tokenizer infixPattern param name to infixPatterns, leading to broken pyspark serialization of such param
* Added missing addInfixPattern() function to PySpark, to allow adding patterns to current value
* Model Downloader clearCache now properly removes both .zip files and extracted content
* Model Downloader is now capable of reading all types of models properly
* Added missing clearCache function into PySpark

---------------
Developer API
---------------
* Function names in model downloader code has been refactored consistenyl

---------------
Other
---------------
* RocksDB rolled back to previous version to support Windows
* NerCRF unittest modified to reduce time to test
* Removed training scripts from repository
* Updated build spark and scala version

========

Page 21 of 23

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.