Ginza

Latest version: v5.2.0

Safety actively analyzes 723650 Python packages for vulnerabilities to keep your Python projects secure.

Page 4 of 5

3.0.0

- 2020-01-15
- Important changes
- Distribute `ginza` and `ja_ginza` from PyPI
- Simple installation; `pip install ginza`, and run `ginza`
- The model package, `ja_ginza`, is also available from PyPI.
- Model improvements
- Change NER training data-set to GSK2014-A (2019) BCCWJ edition
- Improved accuracy of NER
- `token.ent_type_` value is changed to [Sekine's Extended Named Entity Hierarchy](http://liat-aip.sakura.ne.jp/ene/ene8/definition_jp/html/enedetail.html)
- Add `ENE7` attribute to the last field of the output of `ginza`
- Move [OntoNotes5](https://catalog.ldc.upenn.edu/docs/LDC2013T19/OntoNotes-Release-5.0.pdf) -based label to `token._.ne`
- We extended the OntoNotes5 named entity labels with `PHONE`, `EMAIL`, `URL`, and `PET_NAME`
- Overall accuracy is improved by executing `spacy pretrain` over 100 epochs
- Multi-task learning of `spacy train` effectively working on UD Japanese BCCWJ
- The newest `SudachiDict_core-20191224`
- `ginzame`
- Execute `sudachipy` by `multiprocessing.Pool` and output results with `mecab` like format
- Now `sudachipy` command requires additional SudachiDict package installation
- Breaking API Changes
- commands
- `ginza` (`ginza.command_line.main_ginza`)
- change option `mode` to `sudachipy_mode`
- drop options: `disable_pipes` and `recreate_corrector`
- add options: `hash_comment`, `parallel`, `files`
- add `mecab` to the choices for the argument of `-f` option
- add `parallel NUM_PROCESS` option (EXPERIMENTAL)
- add `ENE7` attribute to conllu miscellaneous field
- `ginza.ent_type_mapping.ENE_NE_MAPPING` is used to convert `ENE7` label to `NE`
- add `ginzame` (`ginza.command_line.main_ginzame`)
- a multi-process tokenizer providing `mecab` like output format
- spaCy field extensions
- add `token._.ne` for ner label
- `ginza/sudachipy_tokenizer.py`
- change `SudachiTokenizer` to `SudachipyTokenizer`
- use `SUDACHI_DEFAULT_SPLIT_MODE` instead of `SUDACHI_DEFAULT_SPLITMODE` or `SUDACHI_DEFAULT_MODE`
- Dependencies
- upgrade `spacy` to v2.2.3
- upgrade `sudachipy` to v0.4.2

2.2.1

- 2019-10-28
- Improvements
- JapaneseCorrector can merge the `as_*` type dependencies completely
- Bug fixes
- command line tool failed at the specific situations

2.2.0

- 2019-10-04, Ametrine
- Important changes
- `split_mode` has been set incorrectly to sudachipy.tokenizer from v2.0.0 (43)
- This bug caused `split_mode` incompatibility between the training phase and the `ginza` command.
- `split_mode` was set to 'B' for training phase and python APIs, but 'C' for `ginza` command.
- We fixed this bug by setting the default `split_mode` to 'C' entirely.
- This fix may cause the word segmentation incompatibilities during upgrading GiNZA from v2.0.0 to v2.2.0.
- New features
- Add `-f` and `--output-format` option to `ginza` command:
- `-f 0` or `-f conllu` : [CoNLL-U Syntactic Annotation](https://universaldependencies.org/format.html#syntactic-annotation) format
- `-f 1` or `-f cabocha`: [cabocha](https://taku910.github.io/cabocha/) -f1 compatible format
- Add custom token fields:
- `bunsetu_index` : bunsetu index starting from 0
- `reading`: reading of token (not a pronunciation)
- `sudachi`: SudachiPy's morpheme instance (or its list when then tokens are gathered by JapaneseCorrector)
- Performance improvements
- Tokenizer
- Use latest SudachiDict (SudachiDict_core-20190927.tar.gz)
- Use Cythonized SudachiPy (v0.4.0)
- Dependency parser
- Apply `spacy pretrain` command to capture the language model from UD-Japanese BCCWJ, UD_Japanese-PUD and KWDLC.
- Apply multitask objectives by using `-pt 'tag,dep'` option of `spacy train`
- New model file
- ja_ginza-2.2.0.tar.gz

2.0.0

- Add `ginza` command
- run `ginza` from the console
- Change package structure
- module package as `ginza`
- language model package as `ja_ginza`
- `spacy.lang.ja` is overridden by `ginza`
- Remove `sudachipy` related directories
- SudachiPy and its dictionary are installed via `pip` during `ginza` installation
- User dictionary available
- See [Customized dictionary - SudachiPy](https://github.com/WorksApplications/SudachiPy#customized-dictionary)
- Token extension fields
- Added
- `token._.bunsetu_bi_label`, `token._.bunsetu_position_type`
- Remained
- `token._.inf`
- Removed
- `pos_detail` (same value is set to `token.tag_`)

latest

Ginza

Page 4 of 5

3.0.0

2.2.1

2.2.0

2.0.0

1.0.2

1.0.1

Page 4 of 5

Links

Releases