Konoha

Latest version: v5.5.6

Safety actively analyzes 683322 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 6 of 7

3.0.1

- 41 Add whitespace tokenizer

python
In [1]: from tiny_tokenizer import WordTokenizer
In [2]: tk = WordTokenizer("whitespace")
In [3]: tk.tokenize("わたし は 猫")
Out[3]: [私, は, 猫]

3.0.0

- 29, 37 Support detailed Token attributes (thanks chie8842 jun-harashima ysak-y)
- 35 Add `extras_require` (thanks chie8842)
- 39 Support Python3.5

2.1.0

- Support Sudachi tokenizer: https://github.com/himkt/tiny_tokenizer/pull/20


python
from tiny_tokenizer import SentenceTokenizer
from tiny_tokenizer import WordTokenizer


if __name__ == "__main__":
sentence_tokenizer = SentenceTokenizer()
tokenizer = WordTokenizer(tokenizer="Sudachi", mode="A")
^^^^^^^^
You can choose splitting mode.

(https://github.com/WorksApplications/SudachiPy#as-a-python-package)


sentence = "我輩は猫である."
print("input: ", sentence)

result = tokenizer.tokenize(sentence)
print(result)

2.0.0

***This release breaks compatibility.***

Introduce `Token` class.

- https://github.com/himkt/tiny_tokenizer/pull/17

1.3.1

- https://github.com/himkt/tiny_tokenizer/pull/15

We can install `tiny_tokenizer` without word tokenizers.

1.3.0

- Change the type of a result from WordTokenizer.tokenize from `str` to `list[str]` (https://github.com/himkt/tiny_tokenizer/pull/13)

Page 6 of 7

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.