Spark-nlp

Latest version: v5.5.3

Safety actively analyzes 723607 Python packages for vulnerabilities to keep your Python projects secure.

Page 5 of 23

4.4.2

========
----------------
New Features & Enhancements
----------------
* Implement a new Zero-Shot Text Classification for RoBERTa annotator called `RobertaForZeroShotClassification`
* Support Apache Spark 3.4
* Omptize BART models for memory efficiency
* Introducing `cache` feature in BartTransformer
* Improve error handling for max sequence length for transformers in Python
* Improve `MultiDateMatcher` annotator to return multiple dates

----------------
Bug Fixes
----------------
* Fix a bug in Tapas due to exceeding the maximum rank value
* Fix loading Transformer models via loadSavedModel() method from DBFS on Databricks

========

4.4.1

========
----------------
New Features & Enhancements
----------------

* Implement a new Zero-Shot Text Classification for DistilBERT annotator called `DistilBertForZeroShotClassification`
* Adding `threshold` param to `AlbertForSequenceClassification`, `BertForSequenceClassification`, `BertForZeroShotClassification`, `DistilBertForSequenceClassification`, `CamemBertForSequenceClassification`, `DeBertaForSequenceClassification`, LongformerForSequenceClassification`, RoBertaForQuestionAnswering`, `XlmRoBertaForSequenceClassification`, and `XlnetForSequenceClassification` annotators
* Add new notebooks to import models for `SwinForImageClassification` and `ConvNextForImageClassification` annotators for Image Classification

========

4.4.0

========
----------------
New Features
----------------
* Implement a new Zero-Shot Text Classification for BERT annotator called `BertForZeroShotClassification`
* Implement a new ConvNextForImageClassification annotator
* Introducing BART Transformer for text-to-text generation tasks like translation and summarization
* Set custom entity name in Data2Chunk via `setEntityName` param
* Add a new `nerHasNoSchema` param for NerConverter when labels coming from NerDLMOdel and NerCrfModel don't have any schema
----------------
Bug Fixes & Enhancements
----------------
* Fix loading `WordEmbeddingsModel` bug when loading a model from S3 via `cache_folder` config
* Fix `WordEmbeddingsModel` bug failing when it's used with `setEnableInMemoryStorage` set to `True` and LightPipeline
* Remove deprecated parameter enablePatternRegex from EntityRulerApproach & EntityRulerModel
* Deprecate Python 3.6

========

4.3.2

========
----------------
New Features & Enhancements
----------------
* Add S3 support for CoNLL(), POS(), CoNLLU() training classes https://github.com/JohnSnowLabs/spark-nlp/pull/13596
* Add support for non-schema NER (`I-` or `B-`) tags in NerConverter annotator https://github.com/JohnSnowLabs/spark-nlp/pull/13642
* Improve self-hosted examples with better documentation, Docker examples, no broken links, and more https://github.com/JohnSnowLabs/spark-nlp/pull/13575
* Improve error handling for validation evaluation in ClassifierDL and MultiClassifierDL trainable annotators https://github.com/JohnSnowLabs/spark-nlp/pull/13615

----------------
Bug Fixes
----------------
* Fix `Date2Chunk` and `Chunk2Doc` annotators compatibility with PipelineModel https://github.com/JohnSnowLabs/spark-nlp/pull/13609
* Fix `DependencyParserModel` predicting all Chunks as `<no-type>` https://github.com/JohnSnowLabs/spark-nlp/pull/13620
* Removed `calculationsCol` parameter from MultiDocumentAssembler in Python that doesn't actually exist https://github.com/JohnSnowLabs/spark-nlp/pull/13594

========

4.3.1

========
----------------
New Features
----------------
* Easily use external Tokenizers such as spaCy in Spark NLP pipeline
* Implement `params` parameter which can supply custom configurations to the SparkSession

----------------
Bug Fixes & Enhancements
----------------
* Add `entity` field to the metadata in Date2Chunk
* Fix ViT models & pipelines examples in Models Hub

========

4.3.0

========
----------------
New Features
----------------
* Implement HubertForCTC annotator for automatic speech recognition
* Implement SwinForImageClassification annotator for Image Classification
* Introducing CamemBERT for Question Answering annotator
* Implement ZeroShotNerModel annotator for zero-shot NER based on RoBERTa architecture
* Implement Date2Chunk annotator
* Enable params argument in spark_nlp start() function
* Allow doc_id reading CoNLL file datasets

----------------
Bug Fixes & Enhancements
----------------
* Relocating all notebooks back to examples directory
* Improve download/loading models & pipelines from AWS and GCP. When setting `cache_pretrained` directory to AWS and GCP will avoid copying existing models/pipelines
* Improve GitHub templates for Bug reports, documentation, and feature request
* Add documentation to ResourceDownloader
* Refactor `ml` package to allow another DL engine in future
* Apache Spark 3.3.1 is now the base version of Spark NLP
* Spark NLP supports M2 in addition to M1. Therefore, we are renaming `spark-nlp-m1` to `spark-nlp-silicon` on Maven
* Fix calculating delimiter id in CamemBERT
* Fix loadSavedModel for private buckets

========

Page 5 of 23

Releases

Has known vulnerabilities

Previous Next

Spark-nlp

Page 5 of 23

4.4.2

4.4.1

4.4.0

4.3.2

4.3.1

4.3.0

Page 5 of 23

Links

Releases