Budoux

Latest version: v0.7.0

Safety actively analyzes 714815 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 4

0.5.2

What's Changed
* Use overflow-wrap: anywhere; instead of overflow-wrap: break-word; by tamanyan in https://github.com/google/budoux/pull/144
* Add a script to finetune models. by tushuhei in https://github.com/google/budoux/pull/145
* Add quality regression test by tushuhei in https://github.com/google/budoux/pull/146
* Release finetuned model by tushuhei in https://github.com/google/budoux/pull/147 https://github.com/google/budoux/pull/154 https://github.com/google/budoux/pull/161
* Add validation data arg to train.py by tushuhei in https://github.com/google/budoux/pull/148
* Remove direct dependency to NumPy by tushuhei in https://github.com/google/budoux/pull/149
* Add a README for BudouX Scripts by tushuhei in https://github.com/google/budoux/pull/155
* Add score scale arg to build_model.py by tushuhei in https://github.com/google/budoux/pull/156
* Separate HTML processing as a mixin by tushuhei in https://github.com/google/budoux/pull/159

New Contributors
* step-security-bot made their first contribution in https://github.com/google/budoux/pull/163

**Full Changelog**: https://github.com/google/budoux/compare/v0.5.1...v0.5.2

0.5.1

What's Changed
* Add Java module by tushuhei in https://github.com/google/budoux/pull/124
* Separate HTML processing as html_processor.py by tushuhei in https://github.com/google/budoux/pull/126
* Fix bug with nodes to skip by tushuhei in https://github.com/google/budoux/pull/127
* Rename test_utils.py to utils.py by tushuhei in https://github.com/google/budoux/pull/129
* Remove test utils by tushuhei in https://github.com/google/budoux/pull/130
* Universal unit testing by tushuhei in https://github.com/google/budoux/pull/125
* Replace textarea with another skip node by tushuhei in https://github.com/google/budoux/pull/131
* Java style fix by tushuhei in https://github.com/google/budoux/pull/132
* Java code improvement by tushuhei in https://github.com/google/budoux/pull/133
* Java style fix by tushuhei in https://github.com/google/budoux/pull/134
* Fix mypy issue by tushuhei in https://github.com/google/budoux/pull/135
* [Java] Inherit from sonatype oss parent by tushuhei in https://github.com/google/budoux/pull/136
* Improve KNBC HTML Parser by tushuhei in https://github.com/google/budoux/pull/137


**Full Changelog**: https://github.com/google/budoux/compare/v0.5.0...v0.5.1

0.5.0

Highlights
- No major change in using default parsers.
- If you're using a custom model, you need to update it. Read on the "Updating Models" section.
- The `defineClassAs` method in `javascript/src/html_processor.ts` is removed.

Updating Models
As described in 112, the model file structure has been updated for performance improvement and file size reduction. The change is simple; it just adds one layer depth by grouping features as the following example shows.

Before:
json
{"UW1:a": 123, "UW3:b": 271}


After:
json
{"UW1": {"a": 123}, "UW3": {"b": 271}}


You can update your custom model to the latest by running [scripts/translate_model.py](https://github.com/google/budoux/blob/2dfdc7699f430f5e48e78b7e98d7573997d19754/scripts/translate_model.py).


$ python translate_model.py --format=json old-model.json > new-model.json


What's Changed
* Nit fix on some test descriptions by tushuhei in https://github.com/google/budoux/pull/109
* Delete unused tsconfig by tushuhei in https://github.com/google/budoux/pull/110
* Add unit test for Web Components by tushuhei in https://github.com/google/budoux/pull/111
* Update the model structure for faster processing by tushuhei in https://github.com/google/budoux/pull/112
* Refactor feature_extractor by tushuhei in https://github.com/google/budoux/pull/113
* Use tempfile for unit test by tushuhei in https://github.com/google/budoux/pull/114
* Add model translator for ICU by tushuhei in https://github.com/google/budoux/pull/115
* Add a model format updater by tushuhei in https://github.com/google/budoux/pull/117
* Remove defineClassAs function by tushuhei in https://github.com/google/budoux/pull/119
* Remove unnecessary assertion by tushuhei in https://github.com/google/budoux/pull/118
* Remove skip nodes data from JS by tushuhei in https://github.com/google/budoux/pull/120
* Update the Prepare KNBC script to break chunks by specified sequences by tushuhei in https://github.com/google/budoux/pull/121
* Update JA model by tushuhei in https://github.com/google/budoux/pull/122
* Version Bump to 0.5.0 by tushuhei in https://github.com/google/budoux/pull/123


**Full Changelog**: https://github.com/google/budoux/compare/v0.4.1...v0.5.0

0.4.1

⚠️ Breaking Change ⚠️
We added a significant change to the model training script `scripts/train.py`.
* The `--chunk-size` option is removed because the bottleneck of memory consumption has shifted due to the overhaul.
* The script does not shuffle the input data any more. You need to shuffle the data by yourself using tools such as [`shuf`](https://en.wikipedia.org/wiki/Shuf) if needed.

What's Changed
* Faster training with sparse matrix by tushuhei in https://github.com/google/budoux/pull/103
* Add `lang` option to JS CLI by tushuhei in https://github.com/google/budoux/pull/102
* Bump json5 from 2.2.1 to 2.2.3 in /javascript by dependabot in https://github.com/google/budoux/pull/104
* Cleanup the training script by tushuhei in https://github.com/google/budoux/pull/105
* More accurate Japanese model by tushuhei in https://github.com/google/budoux/pull/106


**Full Changelog**: https://github.com/google/budoux/compare/v0.4.0...v0.4.1

0.4.0

What's Changed
* Traditional Chinese support by tushuhei in https://github.com/google/budoux/pull/101


**Full Changelog**: https://github.com/google/budoux/compare/v0.3.0...v0.4.0

0.3.0

What's Changed
Faster model training
We made model training faster by applying JAX's JIT compilation, pooling file writes, etc.
* Faster training data encoding by tushuhei in https://github.com/google/budoux/pull/89
* Add out_span option for better GPU utilization by tushuhei in https://github.com/google/budoux/pull/90
* Apply JAX JIT compiling for faster training by tushuhei in https://github.com/google/budoux/pull/95
* Check in updated Simplified Chinese model by tushuhei in https://github.com/google/budoux/pull/99

Smaller models
We made models smaller by removing less important features, disabling ASCII encoding, etc.
* Remove Unicode Block features by tushuhei in https://github.com/google/budoux/pull/86
* Disable ASCII encoding when building the model file by tushuhei in https://github.com/google/budoux/pull/98
* Output compact model by tushuhei in https://github.com/google/budoux/pull/100

Misc
* encode_data: write without break line join by tushuhei in https://github.com/google/budoux/pull/91
* Update unit tests for the encoding script by tushuhei in https://github.com/google/budoux/pull/92
* Add more granularity in weight outputs by tushuhei in https://github.com/google/budoux/pull/93
* Remove tar module dependency by tushuhei in https://github.com/google/budoux/pull/96

**Full Changelog**: https://github.com/google/budoux/compare/v0.2.1...v0.3.0

Page 2 of 4

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.