Stanza

Latest version: v1.10.1

Safety actively analyzes 723650 Python packages for vulnerabilities to keep your Python projects secure.

Page 3 of 5

1.5.0

Ssurgeon interface

Headlining this release is the initial release of Ssurgeon, a rule-based dependency graph editing tool. Along with the existing Semgrex integration with CoreNLP, Ssurgeon allows for rewriting of dependencies such as in the UD datasets. More information is in the GURT 2023 paper, https://aclanthology.org/2023.tlt-1.7/

In addition to this addition, there are two other CoreNLP integrations, a long list of bugfixes, a few other minor features, and a long list of constituency parser experiments which were somewhere between "ineffective" and "small improvements" and are available for people to experiment with.

CoreNLP integration:
- Ssurgeon interface! New interface allows for editing of dependency graphs using Semgrex patterns and Ssurgeon rules. https://github.com/stanfordnlp/stanza/pull/1205 https://aclanthology.org/2023.tlt-1.7/
- English Morphology class (deterministic English lemmatizer) https://github.com/stanfordnlp/stanza/commit/6aed177731e883ce92057be7e78abdce3141a862
- English constituency -> dependency converter https://github.com/stanfordnlp/stanza/commit/0987794c9e960b32ed75d5804dd5c586466ae061

Bugfixes:
- Bugfix for older versions of torch: https://github.com/stanfordnlp/stanza/commit/376d7ea76248131a96d23e236ab165e7d5a544bb
- Bugfix for training (integration with new scoring script) https://github.com/stanfordnlp/stanza/issues/1167 https://github.com/stanfordnlp/stanza/commit/9c39636c438cbeb00ab7a7e8d9caa0bcd31ccc44
- Demo was showing constituency parser along with dependency parsing, even with conparse off: https://github.com/stanfordnlp/stanza/commit/cbc13b0219281f2c27e89ccf2914e13f8aa2bb1b
- Replace absurdly long characters with UNK (thank you khughitt) https://github.com/stanfordnlp/stanza/issues/1137 https://github.com/stanfordnlp/stanza/pull/1140
- Package all relevant pretrains into default.zip - otherwise pretrains used by NER models which are not the default pretrain were being missed. https://github.com/stanfordnlp/stanza/commit/435685f875766e0b9b2b9b1d4792db1c452f9722
- stanza-train NER training bugfix (wrong pretrain): https://github.com/stanfordnlp/stanza/commit/2757cb40edf7a4bf9f62e31eec4b3632ac5ebcb9
- Pass around device everywhere instead of calling cuda(). this should fix models occasionally being split over multiple devices. would also allow for use of MPS, but the current torch implementation for MPS is buggy https://github.com/stanfordnlp/stanza/issues/1209 https://github.com/stanfordnlp/stanza/pull/1159
- Fix error in preparing tokenizer datasets (thanks dvzubarev): https://github.com/stanfordnlp/stanza/pull/1161
- Fix unnecessary slowness in preparing tokenizer datasets (again, thanks dvzubarev): https://github.com/stanfordnlp/stanza/pull/1162
- Fix using the correct pretrain when rebuilding POS tags for a Depparse dataset (again, thanks dvzubarev): https://github.com/stanfordnlp/stanza/pull/1170
- When using the tregex interface to corenlp, add parse if it isn't already there (again, depparse was being confused with parse): https://github.com/stanfordnlp/stanza/commit/b118473604d50d678c2857c0f39f59ba0cd9c2a3
- Update use of emoji to match latest releases: https://github.com/stanfordnlp/stanza/issues/1195 https://github.com/stanfordnlp/stanza/commit/ea345a88f8916c2ab2cd2e6260caa7831dfe2f23

Features:
- Mechanism for resplitting tokens into MWT https://github.com/stanfordnlp/stanza/issues/95 https://github.com/stanfordnlp/stanza/commit/8fac17f625173b2c2bf1cecf611deecb37399322
- CLI for tokenizing text into one paragraph per line, whitespace separated (useful for Glove, for example) https://github.com/stanfordnlp/stanza/commit/cfd44d17f806703b7ed6719993501366a52afbb1
- `detach().cpu()` speeds things up significantly in some cases https://github.com/stanfordnlp/stanza/commit/ccfbc56b3b312fdde1350104a0d0d5645c9c80cc
- Potentially use a constituency model as a classifier - WIP research project https://github.com/stanfordnlp/stanza/pull/1190
- add an output format `"{:C}"` for document objects which prints out documents as CoNLL: https://github.com/stanfordnlp/stanza/pull/1169
- If a constituency tree is available, include it when outputting conll format for documents: https://github.com/stanfordnlp/stanza/pull/1171
- Same with sentiment: https://github.com/stanfordnlp/stanza/commit/abb581945a70fec335dbfadd71bf8c457fa908eb
- Additional language code coverage (thank you juanro49) https://github.com/stanfordnlp/stanza/commit/5802b10882026c4694a4d966e4200c48c5469b1b https://github.com/stanfordnlp/stanza/commit/f06bf86b566772ea6551c663835ddb9a6f5584ff https://github.com/stanfordnlp/stanza/commit/32f83fa2f2333f42925323c4ac9da059dffdf1dc https://github.com/stanfordnlp/stanza/commit/34505758c9d8de4ca70bfbe5418448ad54af088f
- Allow loading a pipeline for new languages (useful when developing a new suite of models) https://github.com/stanfordnlp/stanza/commit/e7fcd262a6c5f3f71b339fe989bcaa177fb378f1
- Script to count the work done by annotators on aws sagemaker private workforce: https://github.com/stanfordnlp/stanza/pull/1186
- Streaming interface which batch processes items in the stream: https://github.com/stanfordnlp/stanza/commit/2c9fe3dad434b271fa23c20a9cf8ccaf63991f16 https://github.com/stanfordnlp/stanza/issues/550
- Can pass a defaultdict to MultilingualPipeline, useful for specifying the processors for each language at once: https://github.com/stanfordnlp/stanza/commit/70fd2fdc94575dec79c4994ea2dc66a719768ab0 https://github.com/stanfordnlp/stanza/issues/1199
- Transformer at bottom layer of POS - currently only available in English as the `en_combined_bert` model, others to come https://github.com/stanfordnlp/stanza/pull/1132

New models:
- Armenian NER model using an NER labeling of armtdp (thanks to ShakeHakobyan): https://github.com/myavrum/ArmTDP-NER https://github.com/stanfordnlp/stanza/issues/1206 https://github.com/stanfordnlp/stanza/pull/1212
- Sindhi tokenization from ISRA https://github.com/stanfordnlp/stanza/pull/1117
- Sindhi NER from SiNER: https://github.com/stanfordnlp/stanza/commit/2a8ded4b0c327761b047caf433128f13b1ad14bf
- Erzya from UD 2.11 https://github.com/stanfordnlp/stanza/commit/0344ac34b5df602a49da25d58655a24a0ffcd208

Conparser experiments:
- Transformer stack (initial implementation did not help) https://arxiv.org/abs/2010.10669 https://github.com/stanfordnlp/stanza/commit/110031e29259b34be6f958fd6d67d4774d6b084a
- TREE_LSTM constituent composition method (didn't beat MAX) https://github.com/stanfordnlp/stanza/commit/2f722c828fa1364131b670da5b925082e9aa336a
- Learned weighting between bert layers (this did help a little) https://github.com/stanfordnlp/stanza/commit/2d0c69ee449501155225efc2afb53b4ba6eeefe7
- Silver trees: train 10 models, use those models to vote on good trees, use those trees to then train new models. helps smaller treebanks such as IT and VI, but no effect on EN https://github.com/stanfordnlp/stanza/pull/1148
- New in_order_compound transition scheme: no improvement https://github.com/stanfordnlp/stanza/commit/f560b08902cf9f9e20656697c367500389115057
- Multistage training with madgrad or adamw: definite improvement. madgrad included as optional dependency https://github.com/stanfordnlp/stanza/commit/2706c4b100285e50f3d9a69e51ca5955e15ba41d https://github.com/stanfordnlp/stanza/commit/f500936b5ca4ba2305a028241996e5d198afd94b
- Report the scores of tags when retagging (does not affect the conparser training) https://github.com/stanfordnlp/stanza/commit/766341942962e5a5a0aa0cda3dd170ac098ac6f9
- FocalLoss on the transitions using optional dependency: didn't help https://arxiv.org/abs/1708.02002 https://github.com/stanfordnlp/stanza/commit/90a8337083f0dc057ea2a9ee794595a6b292850f
- LargeMarginSoftmax: didn't help https://github.com/tk1980/LargeMarginInSoftmax https://github.com/stanfordnlp/stanza/commit/5edd7242073720aff94f07904009ce0cad47b7ff
- Maxout layer: didn't help https://arxiv.org/abs/1302.4389 https://github.com/stanfordnlp/stanza/commit/c708ce7736ffb021f9a0065f2bedaa8b73de52ba
- Reverse parsing: not expected to help, potentially can be useful when building silver treebanks. May also be useful as a two step parser in the future. https://github.com/stanfordnlp/stanza/commit/4954845ba4b16240e6acf8d45d83161a0dec8d33

1.4.2

- Pipeline cache in Multilingual is a single OrderedDict
https://github.com/stanfordnlp/stanza/issues/1115#issuecomment-1239759362
https://github.com/stanfordnlp/stanza/commit/ba3f64d5f571b1dc70121551364fc89d103ca1cd

- Don't require `pytest` for all installations unless needed for testing
https://github.com/stanfordnlp/stanza/issues/1120
https://github.com/stanfordnlp/stanza/commit/8c1d9d80e2e12729f60f05b81e88e113fbdd3482

- hide SiLU and Minh imports if the version of torch installed doesn't have those nonlinearities
https://github.com/stanfordnlp/stanza/issues/1120
https://github.com/stanfordnlp/stanza/commit/6a90ad4bacf923c88438da53219c48355b847ed3

- Reorder & normalize installations in setup.py
https://github.com/stanfordnlp/stanza/pull/1124

1.4.1

Overview

We improve the quality of the POS, constituency, and sentiment models, add an integration to displaCy, and add new models for a variety of languages.

New NER models

- New Polish NER model based on NKJP from Karol Saputa and ryszardtuora
https://github.com/stanfordnlp/stanza/issues/1070
https://github.com/stanfordnlp/stanza/pull/1110

- Make GermEval2014 the default German NER model, including an optional Bert version
https://github.com/stanfordnlp/stanza/issues/1018
https://github.com/stanfordnlp/stanza/pull/1022

- Japanese conversion of GSD by Megagon
https://github.com/stanfordnlp/stanza/pull/1038

- Marathi NER dataset from L3Cube. Includes a Sentiment model as well
https://github.com/stanfordnlp/stanza/pull/1043

- Thai conversion of LST20
https://github.com/stanfordnlp/stanza/commit/555fc0342decad70f36f501a7ea1e29fa0c5b317

- Kazakh conversion of KazNERD
https://github.com/stanfordnlp/stanza/pull/1091/commits/de6cd25c2e5b936bc4ad2764b7b67751d0b862d7

Other new models

- Sentiment conversion of Tass2020 for Spanish
https://github.com/stanfordnlp/stanza/pull/1104

- VIT constituency dataset for Italian
https://github.com/stanfordnlp/stanza/pull/1091/commits/149f1440dc32d47fbabcc498cfcd316e53aca0c6
... and many subsequent updates

- Combined UD models for Hebrew
https://github.com/stanfordnlp/stanza/issues/1109
https://github.com/stanfordnlp/stanza/commit/e4fcf003feb984f535371fb91c9e380dd187fd12

- For UD models with small train dataset & larger test dataset, flip the datasets
UD_Buryat-BDT UD_Kazakh-KTB UD_Kurmanji-MG UD_Ligurian-GLT UD_Upper_Sorbian-UFAL
https://github.com/stanfordnlp/stanza/issues/1030
https://github.com/stanfordnlp/stanza/commit/9618d60d63c49ec1bfff7416e3f1ad87300c7073

- Spanish conparse model from multiple sources - AnCora, LDC-NW, LDC-DF
https://github.com/stanfordnlp/stanza/commit/47740c6252a6717f12ef1fde875cf19fa1cd67cc

Model improvements

- Pretrained charlm integrated into POS. Gives a small to decent gain for most languages without much additional cost
https://github.com/stanfordnlp/stanza/pull/1086

- Pretrained charlm integrated into Sentiment. Improves English, others not so much
https://github.com/stanfordnlp/stanza/pull/1025

- LSTM, 2d maxpool as optional items in the Sentiment
from the paper `Text Classification Improved by Integrating Bidirectional LSTM with Two-dimensional Max Pooling`
https://github.com/stanfordnlp/stanza/pull/1098

- First learn with AdaDelta, then with another optimizer in conparse training. Very helpful
https://github.com/stanfordnlp/stanza/commit/b1d10d3bdd892c7f68d2da7f4ba68a6ae3087f52

- Grad clipping in conparse training
https://github.com/stanfordnlp/stanza/commit/365066add019096332bcba0da4a626f68b70d303

Pipeline interface improvements

- GPU memory savings: charlm reused between different processors in the same pipeline
https://github.com/stanfordnlp/stanza/pull/1028

- Word vectors not saved in the NER models. Saves bandwidth & disk space
https://github.com/stanfordnlp/stanza/pull/1033

- Functions to return tagsets for NER and conparse models
https://github.com/stanfordnlp/stanza/issues/1066
https://github.com/stanfordnlp/stanza/pull/1073
https://github.com/stanfordnlp/stanza/commit/36b84db71f19e37b36119e2ec63f89d1e509acb0
https://github.com/stanfordnlp/stanza/commit/2db43c834bc8adbb8b096cf135f0fab8b8d886cb

- displaCy integration with NER and dependency trees
https://github.com/stanfordnlp/stanza/commit/20714137d81e5e63d2bcee420b22c4fd2a871306

Bugfixes

- Fix that it takes forever to tokenize a single long token (catastrophic backtracking in regex)
TY to Sk Adnan Hassan (VT) and Zainab Aamir (Stony Brook)
https://github.com/stanfordnlp/stanza/pull/1056

- Starting a new corenlp client w/o server shouldn't wait for the server to be available
TY to Mariano Crosetti
https://github.com/stanfordnlp/stanza/issues/1059
https://github.com/stanfordnlp/stanza/pull/1061

- Read raw glove word vectors (they have no header information)
https://github.com/stanfordnlp/stanza/pull/1074

- Ensure that illegal languages are not chosen by the LangID model
https://github.com/stanfordnlp/stanza/issues/1076
https://github.com/stanfordnlp/stanza/pull/1077

- Fix cache in Multilingual pipeline
https://github.com/stanfordnlp/stanza/issues/1115
https://github.com/stanfordnlp/stanza/commit/cdf18d8b19c92b0cfbbf987e82b0080ea7b4db32

- Fix loading of previously unseen languages in Multilingual pipeline
https://github.com/stanfordnlp/stanza/issues/1101
https://github.com/stanfordnlp/stanza/commit/e551ebe60a4d818bc5ba8880dda741cc8bd1aed7

- Fix that conparse would occasionally train to NaN early in the training
https://github.com/stanfordnlp/stanza/commit/c4d785729e42ac90f298e0ef4ab487d14fa35591

Improved training tools

- W&B integration for all models: can be activated with --wandb flag in the training scripts
https://github.com/stanfordnlp/stanza/pull/1040

- New webpages for building charlm, NER, and Sentiment
https://stanfordnlp.github.io/stanza/new_language_charlm.html
https://stanfordnlp.github.io/stanza/new_language_ner.html
https://stanfordnlp.github.io/stanza/new_language_sentiment.html

- Script to download Oscar 2019 data for charlm from HF (requires `datasets` module)
https://github.com/stanfordnlp/stanza/pull/1014

- Unify sentiment training into a Python script, replacing the old shell script
https://github.com/stanfordnlp/stanza/pull/1021
https://github.com/stanfordnlp/stanza/pull/1023

- Convert sentiment to use .json inputs. In particular, this helps with languages with spaces in words such as Vietnamese
https://github.com/stanfordnlp/stanza/pull/1024

- Slightly faster charlm training
https://github.com/stanfordnlp/stanza/pull/1026

- Data conversion of WikiNER generalized for retraining / add new WikiNER models
https://github.com/stanfordnlp/stanza/pull/1039

- XPOS factory now determined at start of POS training. Makes addition of new languages easier
https://github.com/stanfordnlp/stanza/pull/1082

- Checkpointing and continued training for charlm, conparse, sentiment
https://github.com/stanfordnlp/stanza/pull/1090
https://github.com/stanfordnlp/stanza/commit/0e6de808eacf14cd64622415eeaeeac2d60faab2
https://github.com/stanfordnlp/stanza/commit/e5793c9dd5359f7e8f4fe82bf318a2f8fd190f54

- Option to write the results of a NER model to a file
https://github.com/stanfordnlp/stanza/pull/1108

- Add fake dependencies to a conllu formatted dataset for better integration with evaluation tools
https://github.com/stanfordnlp/stanza/commit/6544ef3fa5e4f1b7f06dbcc5521fbf9b1264197a

- Convert an AMT NER result to Stanza .json
https://github.com/stanfordnlp/stanza/commit/cfa7e496ca7c7662478e03c5565e1b2b2c026fad

- Add a ton of language codes, including 3 letter codes for languages we generally treat as 2 letters
https://github.com/stanfordnlp/stanza/commit/5a5e9187f81bd76fcd84ad713b51215b64234986
https://github.com/stanfordnlp/stanza/commit/b32a98e477e9972737ad64deea0bda8d6cebb4ec and others

1.4.0

Not secure

Overview

As part of the new Stanza release, we integrate transformer inputs to the NER and conparse modules. In addition, we now support several additional languages for NER and conparse.

Pipeline interface improvements

- Download resources.json and models into temp dirs first to avoid race conditions between multiple processors
https://github.com/stanfordnlp/stanza/issues/213
https://github.com/stanfordnlp/stanza/pull/1001

- Download models for Pipelines automatically, without needing to call `stanza.download(...)`
https://github.com/stanfordnlp/stanza/issues/486
https://github.com/stanfordnlp/stanza/pull/943

- Add ability to turn off downloads
https://github.com/stanfordnlp/stanza/commit/68455d895986357a2c1f496e52c4e59ee0feb165

- Add a new interface where both processors and package can be set
https://github.com/stanfordnlp/stanza/issues/917
https://github.com/stanfordnlp/stanza/commit/f37042924b7665bbaf006b02dcbf8904d71931a1

- When using pretokenized tokens, get character offsets from text if available
https://github.com/stanfordnlp/stanza/issues/967
https://github.com/stanfordnlp/stanza/pull/975

- If Bert or other transformers are used, cache the models rather than loading multiple times
https://github.com/stanfordnlp/stanza/pull/980

- Allow for disabling processors on individual runs of a pipeline
https://github.com/stanfordnlp/stanza/issues/945
https://github.com/stanfordnlp/stanza/pull/947

Other general improvements

- Add text and sent_id to conll output
https://github.com/stanfordnlp/stanza/discussions/918
https://github.com/stanfordnlp/stanza/pull/983
https://github.com/stanfordnlp/stanza/pull/995

- Add ner to the token conll output
https://github.com/stanfordnlp/stanza/discussions/993
https://github.com/stanfordnlp/stanza/pull/996

- Fix missing Slovak MWT model
https://github.com/stanfordnlp/stanza/issues/971
https://github.com/stanfordnlp/stanza/commit/5aa19ec2e6bc610576bc12d226d6f247a21dbd75

- Upgrades to EN, IT, and Indonesian models
https://github.com/stanfordnlp/stanza/issues/1003
https://github.com/stanfordnlp/stanza/pull/1008
IT improvements with the help of attardi and msimi

- Fix improper tokenization of Chinese text with leading whitespace
https://github.com/stanfordnlp/stanza/issues/920
https://github.com/stanfordnlp/stanza/pull/924

- Check if a CoreNLP model exists before downloading it (thank you interNULL)
https://github.com/stanfordnlp/stanza/pull/965

- Convert the run_charlm script to python
https://github.com/stanfordnlp/stanza/pull/942

- Typing and lint fixes (thank you asears)
https://github.com/stanfordnlp/stanza/pull/833
https://github.com/stanfordnlp/stanza/pull/856

- stanza-train examples now compatible with the python training scripts
https://github.com/stanfordnlp/stanza/issues/896

NER features

- Bert integration (not by default, thank you vythaihn)
https://github.com/stanfordnlp/stanza/pull/976

- Swedish model (thank you EmilStenstrom)
https://github.com/stanfordnlp/stanza/issues/912
https://github.com/stanfordnlp/stanza/pull/857

- Persian model
https://github.com/stanfordnlp/stanza/issues/797

- Danish model
https://github.com/stanfordnlp/stanza/pull/910/commits/3783cc494ee8c6b6d062c4d652a428a04a4ee839

- Norwegian model (both NB and NN)
https://github.com/stanfordnlp/stanza/pull/910/commits/31fa23e5239b10edca8ecea46e2114f9cc7b031d

- Use updated Ukrainian data (thank you gawy)
https://github.com/stanfordnlp/stanza/pull/873

- Myanmar model (thank you UCSY)
https://github.com/stanfordnlp/stanza/pull/845

- Training improvements for finetuning models
https://github.com/stanfordnlp/stanza/issues/788
https://github.com/stanfordnlp/stanza/pull/791

- Fix inconsistencies in B/S/I/E tags
https://github.com/stanfordnlp/stanza/issues/928#issuecomment-1027987531
https://github.com/stanfordnlp/stanza/pull/961

- Add an option for multiple NER models at the same time, merging the results together
https://github.com/stanfordnlp/stanza/issues/928
https://github.com/stanfordnlp/stanza/pull/955

Constituency parser

- Dynamic oracle (improves accuracy a bit)
https://github.com/stanfordnlp/stanza/pull/866

- Missing tags now okay in the parser
https://github.com/stanfordnlp/stanza/issues/862
https://github.com/stanfordnlp/stanza/commit/04dbf4f65e417a2ceb19897ab62c4cf293187c0b

- bugfix of () not being escaped when output in a tree
https://github.com/stanfordnlp/stanza/commit/eaf134ca699aca158dc6e706878037a20bc8cbd4

- charlm integration by default
https://github.com/stanfordnlp/stanza/pull/799

- Bert integration (not the default model) (thank you vythaihn and hungbui0411)
https://github.com/stanfordnlp/stanza/commit/05a0b04ee6dd701ca1c7c60197be62d4c13b17b6
https://github.com/stanfordnlp/stanza/commit/0bbe8d10f895560a2bf16f542d2e3586d5d45b7e

- Preemptive bugfix for incompatible devices from zhaochaocs
https://github.com/stanfordnlp/stanza/issues/989
https://github.com/stanfordnlp/stanza/pull/1002

- New models:
DA, based on [Arboretum](http://catalog.elra.info/en-us/repository/browse/ELRA-W0084/)
IT, based on the [Turin treebank](http://www.di.unito.it/~tutreeb/treebanks.html)
JA, based on [ALT](https://www2.nict.go.jp/astrec-att/member/mutiyama/ALT/)
PT, based on [Cintil](https://catalogue.elra.info/en-us/repository/browse/ELRA-W0055/)
TR, based on [Starlang](https://www.researchgate.net/publication/344829282_Creating_A_Syntactically_Felicitous_Constituency_Treebank_For_Turkish)
ZH, based on CTB7

1.3.0

Not secure

Overview

Stanza 1.3.0 introduces a language id model, a constituency parser, a dictionary in the tokenizer, and some additional features and bugfixes.

New features

- **Langid model and multilingual pipeline**
Based on "A reproduction of Apple's bi-directional LSTM models for language identification in short strings." by Toftrup et al 2021
(https://github.com/stanfordnlp/stanza/commit/154b0e8e59d3276744ae0c8ea56dc226f777fba8)

- **Constituency parser**
Based on "In-Order Transition-based Constituent Parsing" by Jiangming Liu and Yue Zhang. Currently an `en_wsj` model available, with more to come.
(https://github.com/stanfordnlp/stanza/commit/90318023432d584c62986123ef414a1fa93683ca)

- **Evalb interface to CoreNLP**
Useful for evaluating the parser - requires CoreNLP 4.3.0 or later

- **Dictonary tokenizer feature**
Noticeably improved performance for ZH, VI, TH
(https://github.com/stanfordnlp/stanza/pull/776)

Bugfixes / Reliability

- **HuggingFace integration**
No more git issues complaining about unavailable models! (Hopefully)
(https://github.com/stanfordnlp/stanza/commit/f7af5049568f81a716106fee5403d339ca246f38)

- **Sentiment processor crashes on certain inputs**
(issue https://github.com/stanfordnlp/stanza/issues/804, fixed by https://github.com/stanfordnlp/stanza/commit/e232f67f3850a32a1b4f3a99e9eb4f5c5580c019)

1.2.3

Not secure

Overview

In anticipation of a larger release with some new features, we make a small update to fix some existing bugs and add two more NER models.

Bugfixes

- **Sentiment models would crash on no text** (issue https://github.com/stanfordnlp/stanza/issues/769, fixed by https://github.com/stanfordnlp/stanza/pull/781/commits/47889e3043c27f9c5abd9913016929f1857de7bf)

- **Java processes as a context were not properly closed** (https://github.com/stanfordnlp/stanza/pull/781/commits/a39d2ff6801a23aa73add1f710d809a9c0a793b1)

Interface improvements

- **Downloading tokenize now downloads mwt for languages which require it** (issue https://github.com/stanfordnlp/stanza/issues/774, fixed by https://github.com/stanfordnlp/stanza/pull/777, from davidrft)

- **NER model can finetune and save to/from different filenames** (https://github.com/stanfordnlp/stanza/pull/781/commits/0714a0134f0af6ef486b49ce934f894536e31d43)

- **NER model now displays a confusion matrix at the end of training** (https://github.com/stanfordnlp/stanza/pull/781/commits/9bbd3f712f97cb2702a0852e1c353d4d54b4b33b)

NER models

- **Afrikaans, trained in NCHLT** (https://github.com/stanfordnlp/stanza/pull/781/commits/6f1f04b6d674691cf9932d780da436063ebd3381)

- **Italian, trained on a model from FBK** (https://github.com/stanfordnlp/stanza/pull/781/commits/d9a361fd7f13105b68569fddeab650ea9bd04b7f)

Page 3 of 5

Releases

Has known vulnerabilities

Previous Next

Stanza

Page 3 of 5

1.5.0

1.4.2

1.4.1

1.4.0

1.3.0

1.2.3

Page 3 of 5

Links

Releases