Spark-nlp

Latest version: v5.3.3

Safety actively analyzes 638430 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 11 of 22

2.7.4

Not secure
========
----------------
Bugfixes
----------------
* Fix Tensors with a 0 dimension issue in ClassifierDL and SentimentDL
* Fix index error in TokenAssembler
* Fix MatchError in DateMatcher and MultiDateMatcher annotators
* Fix setOutputAsArray and its default value for valueSplitSymbol in Finisher annotator

----------------
Enhancements
----------------
* Implement missing frequencyThreshold and ambiguityThreshold params in WordSegmenterApproach annotator
* Downgrade Hadoop from 3.2 to 2.7 which caused an issue with S3
* Update Apache HTTP Client


========

2.7.3

Not secure
========
---------------
New Features
---------------
* Add anchorDateYear, anchorDateMonth, and anchorDateDay to DateMatcher and MultiDateMatcher to be used for relative dates extraction

----------------
Bugfixes
----------------
* Fix the default value for action parameter in Python wrapper for DocumentNormalizer annotator
* Fix Lemmatizer pretrained models published in 2021

----------------
Enhancements
----------------
* Improve T5Transformer performance on documents with many sentences


========

2.7.2

Not secure
========
----------------
Bugfixes
----------------
* Fix casual mask calculations resulting in bad translation in MarianTransformer
* Fix Serialization issue in the cluster while training ContextSpellChecker
* Fix calculating CHUNK spans based on the sentences' boundaries in RegexMatcher

----------------
Enhancements
----------------
* Add GPU support for training ContextSpellChecker
* Adding Scalatest ability to control tests by tags


========

2.7.1

Not secure
========
----------------
Bugfixes
----------------
* Fix default pretrained model T5Transformer
* Fix default pretrained model WordSegmenter
* Fix missing reference to WordSegmenter in ResourceDwonloader
* Fix T5Transformer models crashing due to unknown task
* Fix the issue of saving and reading ClassifierDL, SentimentDL, and MultiClassifierDL models introduced in the 2.7.0 release

----------------
Enhancements
----------------
* Export new T5 models with optimized Encoder/Decoder
* Add support for alternative tagging with the positional parser in RegexTokenizer
* Refactor AssertAnnotations

----------------
Backward compatibility
----------------
* In order to fix the issue of Classifiers in the clusters, we had to export new TF models and change the read/write functions of these annotators. This caused any model trained prior to the 2.7.0 release not to be compatible with 2.7.1 and require retraining including pre-trained models. (we are re-training all the existing text classification models with 2.7.1)


========

2.7.0

Not secure
========
------------------------------
Major features and improvements
------------------------------
* Introducing MarianTransformer annotator for machine translation based on MarianNMT models. Marian is an efficient, free Neural Machine Translation framework mainly being developed by the Microsoft Translator team (646+ pretrained models & pipelines in 192+ languages)
* Introducing T5Transformer annotator for Text-To-Text Transfer Transformer (Google T5) models to achieve state-of-the-art results on multiple NLP tasks such as Translation, Summarization, Question Answering, Sentence Similarity, and so on
* Introducing brand new and refactored language detection and identification models. The new LanguageDetectorDL is faster, more accurate, and supports up to 375 languages
* Introducing WordSegmenter annotator, a trainable annotator for word segmentation of languages without any rule-based tokenization such as Chinese, Japanese, or Korean
* Introducing DocumentNormalizer annotator cleaning content from HTML or XML documents, applying either data cleansing using an arbitrary number of custom regular expressions either data extraction following the different parameters
* [Spark NLP Display](https://github.com/JohnSnowLabs/spark-nlp-display) for visualization of different types of annotations
* Add support for new multi-lingual models in UniversalSentenceEncoder annotator
* Add support to Lemmatizer to be trained directly from a DataFrame instead of a text file
* Add training helper to transform CoNLL-U into Spark NLP annotator type columns

----------------
Bugfixes and Enhancements
----------------
* Fix all the known issues in ClassifierDL, SentimentDL, and MultiClassifierDL annotators in a Cluster
* NerDL enhancements for memory optimization and logging during the training with the test dataset
* SentenceEmbeddings annotator now reuses the storageRef of any embeddings used in prior
* Fix dropout in SentenceDetectorDL models for more deterministic results. Both English and Multi-lingual models are retrained for the 2.7.0 release
* Fix Python dataType Annotation
* Upgrade to Apache Spark 2.4.7

========

2.6.5

Not secure
========
----------------
Bugfixes
----------------
* Fix a bug in batching sentences in BertSentenceEmbeddings
* Fix AttributeError when trying to load a saved EmbeddingsFinisher in Python

----------------
Enhancements
----------------
* Improve handling exceptions in DocumentAssmbler when user uses a corrupted DataFrame

========

Page 11 of 22

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.