Spark-nlp

Latest version: v5.3.3

Safety actively analyzes 638396 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 15 of 22

2.3.5

Not secure
========
---------------
Overview
---------------
We would like to thank you all for your valuable feedback via our Slack channels and our GitHub repositories.

2.3.4

Not secure
========
---------------
Overview
---------------
Thank you, as always, for the feedback given at Slack and our repos. The most important part of this release,
is how we internally started organizing models. We'll be deploying our model news in
https://github.com/JohnSnowLabs/spark-nlp-models . The models repo will be kept up to date.

As for this release, it improves various internal API functionalities, allowing for positive side-effects across
the library. As an important enhancement, we have added user UDFs and functions for both Scala and Python users
to be able to easily manipulate annotations on DataFrames. Finally, we have fixed various bugs in embeddings
metadata to make sure we provide accurate offsetting information for other annotators to consume it successfully.

---------------
Enhancements
---------------
* Revamped functions in Scala and python to help users deal with annotations from dataframes or in UDF form, such as `map_annotations` and `filter_by_annotations`

---------------
Bugfixes
---------------
* Fixed bugs in ChunkEmbeddings and SentenceEmbeddings causing them to report wrong metadata and offset values
* Fixed a nested import issue in Python causing LightPipelines not to work in some environments

---------------
Developer API
---------------
* downloadModel is now flexible as to which inner downloader class is being used to access AnnotatorModel reference
* pretrained API now deals with defaultModelName as an Option to allow non default pretrained models

---------------
Other
---------------
* version() now returns the version string instead of just printing it

========

2.3.3

Not secure
========
---------------
Overview
---------------
We are very glad to announce this release, it actually ended up much bigger than we expected.
Thanks to the community feedback, we arranged many bugfixes. We also spent some times and started building
models for the TextMatcher, so it got various improvements and bugfixes when dealing with empty sentences or cleaned up tokens.
We also added UDF ready functions in Python to easily deal with Annotations. Finally, we fixed a few bugs when loading models from disk.
Thank you very much for constant feedback on Slack.

---------------
New Features
---------------
* TextMatcher new param `mergeOverlapping` allows for handling overlapping output chunks when matching entities share keywords
* NER overwriter annotator allows for overwriting NER output with custom entities
* Added `map_annotations`, `map_annotations_strict`, `map_annotations_col`, `filter_by_annotations_col` and `explode_annotations_col` functions to python side. Allows dealing with Annotations easily.

---------------
Enhancement
---------------
* Made ChunkEmbeddings output to be compatible with SentenceEmbeddings for better flexibility in pipelines

---------------
Bugfixes
---------------
* Fixed BertEmbeddings crashing on empty input sentences
* Fixed missing load API and import shorcuts on the new Embeddings annotators
* Added missing metadata fields in ChunkEmbeddings
* Fixed wrong sentence IDs in sentences or tokens that got a cleanup during the pipeline
* Fixed typos in docs. Thanks marcinic
* Fixed bad deprecated OCR and SpellChecker python classpath

========

2.3.2

Not secure
========
---------------
Overview
---------------
This release addresses multiple bug fixes and some enhancements regarding memory consumption in BertEmbeddings annotator.
Thanks for your feedback and reports!

---------------
Bugfixes
---------------
* Fix missing EmbeddingsFinisher in Scala and Python
* Reverted embeddings move to copy due to CRC issue
* Fix IndexOutOfBoundsException in SentenceEmbeddings

---------------
Enhancement
---------------
* Optimize BertEmbeddings memory consumption

========

2.3.1

Not secure
========
---------------
Overview
---------------
This quick release addresses a bug in Lemmatizer loading/pretrained function causing it not to work in 2.3.0.
We took the chance to include a feature which did not make it for base 2.3.0 and slightly changed protected variables for
better Java API, also including a pretrained compatible function with Java. Thanks for the quick issue feedback again!

---------------
New Features
---------------
* New EmbeddingsFinisher specializes in dealing with embedding annotators output. Traditional finisher still behaves the same as 2.3.0

---------------
Bugfixes
---------------
* Fixed a bug in previous release causing LemmatizerModel not to be loaded or pretrained load
* Fixed pretrained() function to return proper type in Java

---------------
Developer API
---------------
* defaultModelName, defaultLang and defaultLoc static pretrained properties are now public

========

2.3.0

Not secure
========
---------------
Overview
---------------
Thanks for your contributions and feedback on Slack. This amazing release comes with many new features in the embeddings scope,
allowing pipeline builders to retrieve embeddings for specific bodies of texts in any form given, from sentences to chunks or n-grams.
We also worked a lot on making sure Spark NLP on Java works as intended. Finally, we improved aws profiles compatibility for frameworks
that utilize multiple credential profiles. Unfortunately, we have deprected Eval and OCR due to internal patents in some of the latest improvements
John Snow Labs has contributed to.

---------------
New Features
---------------
* New SentenceEmbeddings annotator utilizes WordEmbeddings or BertEmbeddings to generate sentence or document embeddings
* New ChunkEmbeddings annotator utilizes WordEmbeddings or BertEmbeddings to generate chunk embeddings from Chunker or NGramGenerator outputs
* New StopWordsCleaner integrates Spark ML StopWordsRemoval function into Spark NLP pipeline
* New NGramGenerator annotator integrates Spark ML NGram function into Spark ML with a new cumulative feature to also generate range ngrams like the scikit-learn library

---------------
Enhancements
---------------
* Improved Java intercompatibility on Pretrained and LightPipeline APIs. Examples added.
* Finisher and LightPipelines Parse Embeddings Vector flag allows for optional vector processing to save memory and improve performance
* setInputCols in python can be passed as *args
* new Param enableScore in SentimentDetector to switch output types between confidence score and results (Thanks maxwellpaulm)
* spark_nlp profile name by default in AWS config allows for multiple profile download compatible

---------------
Bugfixes
---------------
* Fixed POS training dataset creator to improve performance

---------------
Deprecations
---------------
* OCR Module dropped from open source support
* Eval Module dropped from open source support

========

Page 15 of 22

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.