
Latest version: v0.12.1

Safety actively analyzes 689525 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 3


Pipeline details

  | Vectors | Tokenizer | Sentencizer | Tagger | Parser | Lemmatizer
-- | -- | -- | -- | -- | -- | -- |
Model | [Word2Vec CBOW `dim=300` `minfreq=10`]( | Rule-based implemented in SpaCy | Rule-based | Multi-task CNN | multi-task CNN | [Lemmy (CST-like)](
Training data | Wikipedia dump (2017-04-21)) and the [Hungarian Webcorpus]( | - | - | [CONLL'17 training data]( | [CONLL'17 training data]( | UD converted Szeged Korpusz
Test data | [Hungarian analogical questions]( | [CONLL'17 test data]( | [CONLL'17 test data]( | [CONLL'17 test data]( | [CONLL'17 test data]( | [CONLL'17 test data](
Accuracy | `ACC` 20.95 | `F1` 99.88 | `F1` 96.64| `ACC` 95.11 | `UAS` 77.52 `LAS` 68.45 | `ACC` 95.60

Baseline tagger and parser from Universal dependencies + vocabulary and word vector model generated from the Hungarian Webcorpus and Wikipedia

Feature | Description
------- | ------------
**Tagger** | 98.23 ACC trained/tested on the Szeged Corpus (Universal Morphology transcript)
**Word vectors** | word2vec bow with 150 dimensions, generated from the Hungarian Webcorpus and Wikipedia
**Brown clusters** | 1024 clusters generated from the Hungarian Webcorpus and Wikipedia

Baseline tagger and parser from Universal dependencies + vocabulary and word vector model generated from the Hungarian Webcorpus and Wikipedia

Feature | Description
------- | ------------
**Tagger** | 93.95 ACC trained/tested on Universal dependencies corpus
**Parser** | 75.12 UAS and 64.85 LAS trained/tested on Universal dependencies corpus
**Word vectors** | word2vec bow with 150 dimensions, generated from the Hungarian Webcorpus and Wikipedia
**Brown clusters** | 1024 clusters generated from the Hungarian Webcorpus and Wikipedia

Vocabulary and word vector model trained on the Hungarian Webcorpus and Wikipedia

Feature | Description
------- | ------------
**Corpora** | Hungarian Webcorpus, Hungarian Wikipedia
**Word vectors** | 150 dimension, word2vec
**Brown clusters** | 1024

Vocabulary and word vector model trained on the Hungarian Webcorpus and Wikipedia

Feature | Description
------- | ------------
**Corpora** | Hungarian Webcorpus, Hungarian Wikipedia
**Word vectors** | 300 dimension, word2vec
**Brown clusters** | 1024


Model size | 1360 MB
Pipeline | tokenizer, sentencizer, tagger, parser, lemmatizer, ner
Vectors | 1140008 unique vectors (300 dimensions)
Sources | Universal Dependencies, Szeged Corpus, Web Corpus, Wikipedia, Hunnerwiki, Szeged NER corpora


Model size | 1360 MB
Pipeline | tokenizer, sentencizer, tagger, parser, lemmatizer, ner
Vectors | 1140008 unique vectors (300 dimensions)
Sources | Universal Dependencies, Szeged Corpus, Web Corpus, Wikipedia, Hunnerwiki, Szeged NER corpora


Model size | 1360 MB
Pipeline | tokenizer, sentencizer, tagger, parser, lemmatizer
Vectors | 1140008 unique vectors (300 dimensions)
Sources | Universal Dependencies, Szeged Corpus, Web Corpus, Wikipedia


Model size | 1350 MB
Pipeline | tokenizer, sentencizer, tagger, parser, lemmatizer
Vectors | 1140008 unique vectors (300 dimensions)
Sources | Universal Dependencies, Szeged Corpus, Web Corpus, Wikipedia



- Added support for new models (`hu_core_news_md-v3.5.2`, `hu_core_news_lg-v3.5.2`, `hu_core_news_trf_xl-v3.5.2`, `hu_core_news_trf_xl-v3.5.2`)
- Updated documentation with `benepar` usage and the noun chunking

Page 1 of 3

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.