Sadedegel

Latest version: v0.21.2

Safety actively analyzes 642283 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 5

0.16.2.1

In one month time we have added lots into sadedegel library.

News

* We have resolved an old and major issue caused by improper `from transformers import AutoTokenizer` calls here and there and lazy loading sentence boundary detector (sbd). Just to given an idea:
* `sadedegel config` CLI call to show sadedegel configuration took **11 sec** in 0.16.1.1 release whereas **2 sec** in 0.16.2.1+
* from sadedegel import Doc call (which is usually the first one to start working with sadedegel) took **9.5 sec** in 0.16.1.1 release whereas **1 sec** in 0.16.2.1+

Feature Drop & Deprecation

* Old configuration capabilities are deprecated (this time unfortunately without prior warnings in earlier releases)
* `DeprecationWarning` is the indication that you do access one of such APIs which will completely be removed by `0.18`
* Please use new API `config_context` (`tf_context` and `idf_context` are just simplified wrappers)

Documentation

* [CONFIG.md](CONFIG.md) details the configuration of sadedegel.

Others

* `__getitem__` function to access any token of a `Sentence`
* Iterator on `Sentence` yields all `Token`s in order.
* default tf method is now `log_norm` instead of `binary` thanks to dafajon's most recent summarizer experiments.

0.16.0.1

This release is mainly devoted to centralized configuration. Lot's have changed, hopefully not but maybe broken (Always feel free to open an issue)

New Capabilities

* New command for sadedegel CLI, `sadedegel config` to retrieve all possible configurations.
* default values (`sadedegel/default.ini`) are shipped with sadedegel can be overwritten by creating a user defined config file in `~/.sadedegel/user.ini` (overwritten values are indicated on `sadedegel config` output.)
* Configurable `tf` and `idf` vectors per `Sentence` is ready with new configuration model.
* We have finally implemented `forward` version of `BandSummarizer` explained in [sadedegel Presentation](https://sadedegel.ai/detail/)

Internal Update

* Previously `sadedegel.Doc` was a Python class which is initialized with a document (string), we have seen some caveats in this approach and now `sadedegel.Doc` is an instance of `sadedegel.DocumentBuilder` and without changing (hopefully !!!) end user experience what you do is to trigger `__call__` function returning a `sadedegel.Document` instance.

0.15.2.4

In one month time we have added lots into sadedegel library.

News

* We have [doruktiktiklar](https://github.com/doruktiktiklar) as the first code contributor out of Global Maksimum AI team.


New Capabilities

* **ADD**: Addition of Vocabulary and Token concepts into library
* `Token`: singleton per word (case sensitive) to store unique token features (lower form, shape, document frequency, etc.)
* New `sadedegel-build-vocabulary` to manage **sadedegel** vocabularies.

New Summarizers
* **ADD**: TextRank Summarizer
TextRank summarizer uses Google's PageRank algorithm based on distance/similarity defined by BERT embedding cosine distance/similarity (as of this release and more to come)
* **ADD**: TFIDF Summarizer
TFIDF Summarizer uses element sum of tfidf vector of a sentence as the relevance score of a sentence in a document.

Others

* **UPDATE**: Some annotator consensus issues on summary corpus.
* **UPDATE**: A better command-line for summarizer evaluation. Check `sadedegel-summarize evaluate` for more
* **ADD**: Sentences level `tf`, `idf` and `tfidf` embeddings
* **ADD**: `Doc` has `tfidf_embeddings` property similar to `bert_embeddings` property.


Documentation

* **ADD**: Youtube webinar videos (in Turkish) on [sadedeGel YouTube Channel](https://www.youtube.com/channel/UCyNG1Mehl44XWZ8LzkColuw)

Contribution Guidelines
* **ADD**: Commit Guidelines
* **ADD**: **New Feature** checklist

Feature Drop & Deprecation

* **DROP**: Code quality guidelines is removed since [Code Inspector](https://www.code-inspector.com) limits the number of lines per open source project. We might continue with other providers later in the future.

* **DEPRECATED**: `Doc.sents` will be removed by version `0.17`
* Use `[i]` to access **i**th sentences of a document
* `Doc` object now implements `__iter__` to let iterate over all sentences of a document.

Bugfix
* Properly handle empty documents. Ex `Doc("")` or `Doc('')`

0.14.1

* **ADD**: We have initialize sadedegel web page on Jekyll [SadedeGel WebSite](https://globalmaksimum.github.io/sadedegel/)
* **ADD**: Add **hotfix** contribution process for sadedegel into `CONTRIBUTING.md`
* **ADD**: Sadedegel Slack channel.
* **ADD**: Evaluation scores of new experimental tokenizer (Simple Tokenizer)

0.13.5

* **ADD**: Major change of this release is Simple word tokenizer implementation by dafajon after seeing the issues with BERT Tokenizer. Note that simple tokenizer is still experimental and not compatible with all summarizers (Cluster based summarizer automatically switch to BERT Tokenizer in order to be able to utilize BERT embeddings)
* **ADD**: Introduction of `sadedgel.set_config` to modify some sadedegel configurations. Such as word tokenizer.
* **ADD**: `tags` are added to `ExtractiveSummarizer` in order to filter them out (in evaluation etc.) easily.
* **ADD**: Thanks to [Code Inspector](https://code-inspector.com) `sadedeGel` is under constant code quality monitoring with an intial grade of **A** (Score **94**). We will keep it high as much as we can as the capabilities of the library grows.
* **CHANGE**: Downgrade sklearn dependency back to `0.23.1` to prevent serialization compatibility warnings.
* **CHANGE**: Score normalization of summarizers push up to parent abstract class `ExtractiveSummarizer`, improving code quality by reducing repetitive code blocks.

0.12

* :warning: **CHANGE**: We have changed `Doc` constructor. Use new `from_sentences` class method to construct a new `Doc` object using list of strings (representing sentences) **resolves**: 47

* **CHANGE**: `Sentences` object now holds a reference to originating `Doc` object (Previously reference to `Doc.sents`) for more flexibility.

* **CHANGE**: We have significantly standardized our summarizers (specifically cluster based summarizers) **resolves**: 59 Summarizers now allow following parameter types on `predict` and `__call__` functions:
* `Doc`
* `List[Sentences]`
* `List[str]` (each element is taken as a sentence)

* **ADD**: We have completed documentation of `sadedegel*` commandlines' entrypoints
* `sadedegel`
* `sadedegel-dataset`
* `sadedegel-dataset-extended`
* `sadedegel-summarize`
* `sadedegel-sbd`
* `sadedegel-server`

* **FIX**: `sadedegel info` returns **Heroku Application** address properly.
* **FIX**: Fix memoization bug on `Sentences.tokens_with_special_symbols` providing 10% faster `Sentences.tokens` calls.

Page 2 of 5

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.