Hojichar

Latest version: v0.11.4

Safety actively analyzes 706267 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 4

0.7.0

Changes
- HojiChar CLI is now parallelized by the number of CPU cores by default 🎉. By the option `--jobs`/`-j`, the user specifies the number of parallelisms.

0.6.1

To make the JSON of the dumped statistics easier to read
- Displaying in UTF-8, not ASCII
- Making indent

0.6.0

Changes
Changes were made to make the analysis of discarded text more convenient.
- `ignore_filtered` flag of the `hojichar.Compose` is abolished.
- `skip_rejected` flag is added to the `hojichar.Filter`.
- `--all` option is added to the CLI.
- `dump_reason` flag is added to `hojichar.document_filters.JSONDumper`.
- Logging the Filter and its primitive member variable when the filter discards the document.
- Discarded text is no longer converted to blank characters.

With these changes, we can analyze the following outputs
Profile `myprofile.py`:
python
from hojichar.filters import document_filters as dflt

FILTER = Compose(
[
dflt.JSONLoader(key="text", ignore=True),
dflt.DiscardAdultContentJa(p=0.9, ignore_confused=True),
dflt.JSONDumper(dump_reason=True, skip_rejected=False),
],
)

And

cat dirty_texts.jsonl | hojichar -p myprofile.py --all | jq

Get such lines:
json
{
"text": "劇訳表示。 : 経済産業省「国民の皆さん、トイレットペーパーは余分に備えを」【防災の日】\n< 【防衛省】15年度予算、概算要求が過去最高額へ←「GO!日本」\n「初キッスの年齢を教えろ」【海外反応】 >\n経済産業省「国民の皆さん、トイレットペーパーは余分に備えを」【防災の日】\n<防災の日>経産省トイレットペーパー備蓄PR「1か月分」",
"is_rejected": true,
"reason": {
"name": "DiscardAdultContentJa",
"p": 0.9,
"matched_text": "えろ",
"matched_text_neighbor": "へ←「GO!日本」\n「初キッスの年齢を教えろ」【海外反応】 >\n経済産業省「国民の皆"
}
}

0.5.3

- Added PEP561 compliant `py.typed` to the package.
- Now `mypy` can use our type annotations in HojiChar.

0.5.2

Fix the behavior of `--dump-stats` option in HojiChar CLI.

0.5.1

What's new
- Major revision of the README. We have tried to improve the content in English and to make it more understandable to newcomers. We will continue to improve the README as it is still insufficient.
- Clean up for doc/ directory.

Page 3 of 4

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.