Spark-nlp

Latest version: v5.5.1

Safety actively analyzes 685670 Python packages for vulnerabilities to keep your Python projects secure.

Page 16 of 23

2.3.3

Not secure

========
---------------
Overview
---------------
We are very glad to announce this release, it actually ended up much bigger than we expected.
Thanks to the community feedback, we arranged many bugfixes. We also spent some times and started building
models for the TextMatcher, so it got various improvements and bugfixes when dealing with empty sentences or cleaned up tokens.
We also added UDF ready functions in Python to easily deal with Annotations. Finally, we fixed a few bugs when loading models from disk.
Thank you very much for constant feedback on Slack.

---------------
New Features
---------------
* TextMatcher new param `mergeOverlapping` allows for handling overlapping output chunks when matching entities share keywords
* NER overwriter annotator allows for overwriting NER output with custom entities
* Added `map_annotations`, `map_annotations_strict`, `map_annotations_col`, `filter_by_annotations_col` and `explode_annotations_col` functions to python side. Allows dealing with Annotations easily.

---------------
Enhancement
---------------
* Made ChunkEmbeddings output to be compatible with SentenceEmbeddings for better flexibility in pipelines

---------------
Bugfixes
---------------
* Fixed BertEmbeddings crashing on empty input sentences
* Fixed missing load API and import shorcuts on the new Embeddings annotators
* Added missing metadata fields in ChunkEmbeddings
* Fixed wrong sentence IDs in sentences or tokens that got a cleanup during the pipeline
* Fixed typos in docs. Thanks marcinic
* Fixed bad deprecated OCR and SpellChecker python classpath

========

2.3.2

Not secure

========
---------------
Overview
---------------
This release addresses multiple bug fixes and some enhancements regarding memory consumption in BertEmbeddings annotator.
Thanks for your feedback and reports!

---------------
Bugfixes
---------------
* Fix missing EmbeddingsFinisher in Scala and Python
* Reverted embeddings move to copy due to CRC issue
* Fix IndexOutOfBoundsException in SentenceEmbeddings

---------------
Enhancement
---------------
* Optimize BertEmbeddings memory consumption

========

2.3.1

Not secure

========
---------------
Overview
---------------
This quick release addresses a bug in Lemmatizer loading/pretrained function causing it not to work in 2.3.0.
We took the chance to include a feature which did not make it for base 2.3.0 and slightly changed protected variables for
better Java API, also including a pretrained compatible function with Java. Thanks for the quick issue feedback again!

---------------
New Features
---------------
* New EmbeddingsFinisher specializes in dealing with embedding annotators output. Traditional finisher still behaves the same as 2.3.0

---------------
Bugfixes
---------------
* Fixed a bug in previous release causing LemmatizerModel not to be loaded or pretrained load
* Fixed pretrained() function to return proper type in Java

---------------
Developer API
---------------
* defaultModelName, defaultLang and defaultLoc static pretrained properties are now public

========

2.3.0

Not secure

========
---------------
Overview
---------------
Thanks for your contributions and feedback on Slack. This amazing release comes with many new features in the embeddings scope,
allowing pipeline builders to retrieve embeddings for specific bodies of texts in any form given, from sentences to chunks or n-grams.
We also worked a lot on making sure Spark NLP on Java works as intended. Finally, we improved aws profiles compatibility for frameworks
that utilize multiple credential profiles. Unfortunately, we have deprected Eval and OCR due to internal patents in some of the latest improvements
John Snow Labs has contributed to.

---------------
New Features
---------------
* New SentenceEmbeddings annotator utilizes WordEmbeddings or BertEmbeddings to generate sentence or document embeddings
* New ChunkEmbeddings annotator utilizes WordEmbeddings or BertEmbeddings to generate chunk embeddings from Chunker or NGramGenerator outputs
* New StopWordsCleaner integrates Spark ML StopWordsRemoval function into Spark NLP pipeline
* New NGramGenerator annotator integrates Spark ML NGram function into Spark ML with a new cumulative feature to also generate range ngrams like the scikit-learn library

---------------
Enhancements
---------------
* Improved Java intercompatibility on Pretrained and LightPipeline APIs. Examples added.
* Finisher and LightPipelines Parse Embeddings Vector flag allows for optional vector processing to save memory and improve performance
* setInputCols in python can be passed as *args
* new Param enableScore in SentimentDetector to switch output types between confidence score and results (Thanks maxwellpaulm)
* spark_nlp profile name by default in AWS config allows for multiple profile download compatible

---------------
Bugfixes
---------------
* Fixed POS training dataset creator to improve performance

---------------
Deprecations
---------------
* OCR Module dropped from open source support
* Eval Module dropped from open source support

========

2.2.2

Not secure

========
---------------
Overview
---------------
Thank you again for all your feedback and questions in our Slack channel. Such feedback from users and contributors
(thank you Stuart Lynn sllynn) helped to find several python module bugs. We also fixed and improved OCR support
towards extracting page coordinates and fixed NerDL evaluator from Python

---------------
Enhancements
---------------
* Added a create_models.py python script to generate Graphs for NerDL without the need of jupyter
* Added a new annotator Token2Chunk to convert all tokens to chunk types (useful for extracting token coordinates from OCR)
* Added OCR Page Dimensions
* Python setInputCols now accepts *args no need to input list

---------------
Bugfixes
---------------
* Fixed python support of NerDL evaluation not taking all params appropriately
* Fixed a bug in case sensitivity matching of embeddings format in python (Thanks sllynn)
* Fixed a bug in python DateMatcher with dateFormat param not working (Thanks sllynn)
* Fixed a bug in PositionFinder reporting duplicate coordinate elements

----------------
Developer API
----------------
* Renamed trainValidationProp to validationSplit in NerDLApproach

----------------
Documentation
----------------
* Added several missing annotator documentation in docs page

========

2.2.1

Not secure

========
---------------
Overview
---------------
This short release is to address a few uncovered issues in the previous 2.2.0 release. Thank you all for quick feedback.

---------------
Enhancements
---------------
* NerDLApproach new param includeValidationProp allows partitioning the training set and exclude a fraction
* NerDLApproach trainValidationProp now randomly samples the data as opposed to head first

---------------
Bugfixes
---------------
* Fixed a bug in ResourceHelper causing folder resources to fail when a folder is empty (affects various annotators)
* Fixed a bug in python embeddings format not parsed to upper case
* Fixed a bug in python causing an incapability to load PipelineModels after loading embeddings

========

Page 16 of 23

Releases

Has known vulnerabilities

Previous Next

Spark-nlp

Page 16 of 23

2.3.3

2.3.2

2.3.1

2.3.0

2.2.2

2.2.1

Page 16 of 23

Links

Releases