We release konoha v5.0.0. This version includes several major interface changes.
🚨 Breaking changes
Remove an option `with_postag`
Before
python
tokenizer_with_postag = WordTokenizer(tokenizer="mecab", with_postag=True)
tokenizer_without_postag = WordTokenizer(tokenizer="mecab", with_postag=False)
After
`with_postag` was simply removed.
Note that the option was also removed from API.
python
tokenizer = WordTokenizer(tokenizer="mecab")
Add `/api/v1/batch_tokenize` and prohibit users to pass `texts` to `/api/v1/tokenize`
Konoha 4.x.x allows users to pass `texts` to `/api/v1/tokenize`.
From 5.0.0, we provide the new endpoint `/api/v1/batch_tokenize` for batch tokenization.
Before
bash
curl localhost:8000/api/v1/tokenize \
-X POST \
-H "Content-Type: application/json" \
-d '{"tokenizer": "mecab", "texts": ["自然言語処理"]}'
After
bash
curl localhost:8000/api/v1/batch_tokenize \
-X POST \
-H "Content-Type: application/json" \
-d '{"tokenizer": "mecab", "texts": ["自然言語処理"]}'
---
core feature
- Remove postag information from `__repr__` (144)
- Remove `with_postag` from WordTokenizer (141)
- Remove konoa.konoha_token (140)
- Extract batch tokenization from WordTokenizer.tokenize (137)
other
- Introduce rich (143)
- Import libraries in initializers of tokenizer classes (142)
- Update tests (136)
api
- Change way to receive endpoint (139)
- Add endpoint `/v1/api/batch_tokenize` to konoha API (138)
- Support all options available for WordTokenizer in API server (135)