Budoux

Latest version: v0.6.2

Safety actively analyzes 623965 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 3

0.6.2

Thai is now supported! 🎉

What's Changed
* Add the scale argument to encode_data.py by tushuhei in https://github.com/google/budoux/pull/408
* Nit fix for an ignored test by tushuhei in https://github.com/google/budoux/pull/407
* Ja model improvement by tushuhei in https://github.com/google/budoux/pull/410
* Add granularity option to prepare_knbc.py by tushuhei in https://github.com/google/budoux/pull/417
* Add Thai language support by tushuhei in https://github.com/google/budoux/pull/421
* Improve typing by amitmarkel in https://github.com/google/budoux/pull/426
* Update README for Thai support by tushuhei in https://github.com/google/budoux/pull/429
* Rename returns to return by tushuhei in https://github.com/google/budoux/pull/415

New Contributors
* amitmarkel made their first contribution in https://github.com/google/budoux/pull/426

**Full Changelog**: https://github.com/google/budoux/compare/v0.6.1...v0.6.2

0.6.1

What's Changed
* Bump typescript-eslint/eslint-plugin from 6.9.1 to 6.10.0 in /javascript by dependabot in https://github.com/google/budoux/pull/353
* Bump org.apache.maven.plugins:maven-surefire-plugin from 3.2.1 to 3.2.2 in /java by dependabot in https://github.com/google/budoux/pull/354
* Bump actions/dependency-review-action from 3.1.1 to 3.1.2 by dependabot in https://github.com/google/budoux/pull/357
* Bump types/node from 20.8.3 to 20.9.0 in /javascript by dependabot in https://github.com/google/budoux/pull/356
* Support weighted samples by tushuhei in https://github.com/google/budoux/pull/358
* Fix unpaired close tags and self-closing tags by kojiishi in https://github.com/google/budoux/pull/360
* [Java] Stop emitting close tags if self-closing by kojiishi in https://github.com/google/budoux/pull/362
* Update Google Java Format action by tushuhei in https://github.com/google/budoux/pull/363
* Bump actions/dependency-review-action from 3.1.2 to 3.1.3 by dependabot in https://github.com/google/budoux/pull/364
* Bump typescript-eslint/eslint-plugin from 6.10.0 to 6.11.0 in /javascript by dependabot in https://github.com/google/budoux/pull/365
* [java] Fix errors by collapsed white spaces and `<br>` by kojiishi in https://github.com/google/budoux/pull/367
* Bump github/codeql-action from 2.22.5 to 2.22.6 by dependabot in https://github.com/google/budoux/pull/368
* [java] Replace `wholeText()` with `NodeVisitor` by kojiishi in https://github.com/google/budoux/pull/369
* Implement tail for node visitor by tushuhei in https://github.com/google/budoux/pull/370
* Update jsoup to 1.16.2 by tushuhei in https://github.com/google/budoux/pull/371
* Version up to 0.6.1 by tushuhei in https://github.com/google/budoux/pull/372


**Full Changelog**: https://github.com/google/budoux/compare/v0.6.0...v0.6.1

0.6.0

Noteworthy changes
- BudouX Web Components don't use Shadow DOM anymore. The segmentation results will be reflected in their Light DOM, where the global styles can apply. 291
- Phrases are segmented by ZWSP (U+200B) not `<wbr>` for a better screen reader experience. 346
- You can insert non-breaking markup (`<nobr` and `white-space: nowrap`) when you have a phrase you don't want to break. 240

What's Changed
* Remove dependency to gts by tushuhei in https://github.com/google/budoux/pull/187
* Add `Parser.parseBoundaries` for JavaScript by kojiishi in https://github.com/google/budoux/pull/234
* Replace `slice` with `substring` by kojiishi in https://github.com/google/budoux/pull/241
* Support non-breaking content (`<nobr>` and `white-space: nowrap`) by kojiishi in https://github.com/google/budoux/pull/240
* Make scripts run without install by tushuhei in https://github.com/google/budoux/pull/239
* Add permissions to style check action by tushuhei in https://github.com/google/budoux/pull/246
* Specify maxsplit to handle colon symbols properly by tushuhei in https://github.com/google/budoux/pull/247
* Support non-breaking content in java by kojiishi in https://github.com/google/budoux/pull/248
* Support non-breaking content in Python by kojiishi in https://github.com/google/budoux/pull/251
* Nit: use get_nowait instead of get by tushuhei in https://github.com/google/budoux/pull/253
* Remove utils from JavaScript module by tushuhei in https://github.com/google/budoux/pull/262
* Move hasChildTextNode to HTML Processor by tushuhei in https://github.com/google/budoux/pull/274
* Fix mypy issues by tushuhei in https://github.com/google/budoux/pull/308
* Fix Python dependency issues by tushuhei in https://github.com/google/budoux/pull/316
* Avoid inserting separators to where the source has one by kojiishi in https://github.com/google/budoux/pull/342
* [Web Components] Use Light DOM instead of Shadow DOM by tushuhei in https://github.com/google/budoux/pull/291
* Use ZWSP instead of WBR by tushuhei in https://github.com/google/budoux/pull/346
* [Java] Use ArrayDeque instead of Stack by tushuhei in https://github.com/google/budoux/pull/349
* Rename applyElement to applyToElement by tushuhei in https://github.com/google/budoux/pull/348
* Update README to use ZWSP by tushuhei in https://github.com/google/budoux/pull/347
* Version up to 0.6.0 by tushuhei in https://github.com/google/budoux/pull/343

**Full Changelog**: https://github.com/google/budoux/compare/v0.5.2...v0.6.0

0.5.2

What's Changed
* Use overflow-wrap: anywhere; instead of overflow-wrap: break-word; by tamanyan in https://github.com/google/budoux/pull/144
* Add a script to finetune models. by tushuhei in https://github.com/google/budoux/pull/145
* Add quality regression test by tushuhei in https://github.com/google/budoux/pull/146
* Release finetuned model by tushuhei in https://github.com/google/budoux/pull/147 https://github.com/google/budoux/pull/154 https://github.com/google/budoux/pull/161
* Add validation data arg to train.py by tushuhei in https://github.com/google/budoux/pull/148
* Remove direct dependency to NumPy by tushuhei in https://github.com/google/budoux/pull/149
* Add a README for BudouX Scripts by tushuhei in https://github.com/google/budoux/pull/155
* Add score scale arg to build_model.py by tushuhei in https://github.com/google/budoux/pull/156
* Separate HTML processing as a mixin by tushuhei in https://github.com/google/budoux/pull/159

New Contributors
* step-security-bot made their first contribution in https://github.com/google/budoux/pull/163

**Full Changelog**: https://github.com/google/budoux/compare/v0.5.1...v0.5.2

0.5.1

What's Changed
* Add Java module by tushuhei in https://github.com/google/budoux/pull/124
* Separate HTML processing as html_processor.py by tushuhei in https://github.com/google/budoux/pull/126
* Fix bug with nodes to skip by tushuhei in https://github.com/google/budoux/pull/127
* Rename test_utils.py to utils.py by tushuhei in https://github.com/google/budoux/pull/129
* Remove test utils by tushuhei in https://github.com/google/budoux/pull/130
* Universal unit testing by tushuhei in https://github.com/google/budoux/pull/125
* Replace textarea with another skip node by tushuhei in https://github.com/google/budoux/pull/131
* Java style fix by tushuhei in https://github.com/google/budoux/pull/132
* Java code improvement by tushuhei in https://github.com/google/budoux/pull/133
* Java style fix by tushuhei in https://github.com/google/budoux/pull/134
* Fix mypy issue by tushuhei in https://github.com/google/budoux/pull/135
* [Java] Inherit from sonatype oss parent by tushuhei in https://github.com/google/budoux/pull/136
* Improve KNBC HTML Parser by tushuhei in https://github.com/google/budoux/pull/137


**Full Changelog**: https://github.com/google/budoux/compare/v0.5.0...v0.5.1

0.5.0

Highlights
- No major change in using default parsers.
- If you're using a custom model, you need to update it. Read on the "Updating Models" section.
- The `defineClassAs` method in `javascript/src/html_processor.ts` is removed.

Updating Models
As described in 112, the model file structure has been updated for performance improvement and file size reduction. The change is simple; it just adds one layer depth by grouping features as the following example shows.

Before:
json
{"UW1:a": 123, "UW3:b": 271}


After:
json
{"UW1": {"a": 123}, "UW3": {"b": 271}}


You can update your custom model to the latest by running [scripts/translate_model.py](https://github.com/google/budoux/blob/2dfdc7699f430f5e48e78b7e98d7573997d19754/scripts/translate_model.py).


$ python translate_model.py --format=json old-model.json > new-model.json


What's Changed
* Nit fix on some test descriptions by tushuhei in https://github.com/google/budoux/pull/109
* Delete unused tsconfig by tushuhei in https://github.com/google/budoux/pull/110
* Add unit test for Web Components by tushuhei in https://github.com/google/budoux/pull/111
* Update the model structure for faster processing by tushuhei in https://github.com/google/budoux/pull/112
* Refactor feature_extractor by tushuhei in https://github.com/google/budoux/pull/113
* Use tempfile for unit test by tushuhei in https://github.com/google/budoux/pull/114
* Add model translator for ICU by tushuhei in https://github.com/google/budoux/pull/115
* Add a model format updater by tushuhei in https://github.com/google/budoux/pull/117
* Remove defineClassAs function by tushuhei in https://github.com/google/budoux/pull/119
* Remove unnecessary assertion by tushuhei in https://github.com/google/budoux/pull/118
* Remove skip nodes data from JS by tushuhei in https://github.com/google/budoux/pull/120
* Update the Prepare KNBC script to break chunks by specified sequences by tushuhei in https://github.com/google/budoux/pull/121
* Update JA model by tushuhei in https://github.com/google/budoux/pull/122
* Version Bump to 0.5.0 by tushuhei in https://github.com/google/budoux/pull/123


**Full Changelog**: https://github.com/google/budoux/compare/v0.4.1...v0.5.0

Page 1 of 3

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.