Wetextprocessing

Latest version: v1.0.4.1

Safety actively analyzes 682334 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 4

1.0666910

zh_itn_text = "你好 WeTextProcessing 一点零,船新版本儿,船新体验儿,简直六六六,九和六"
en_tn_text = "Hello WeTextProcessing 1.0, life is short, just use wetext, 666, 9 and 10"
zh_tn_model = ZhNormalizer(remove_erhua=True, overwrite_cache=True)
zh_itn_model = InverseNormalizer(enable_0_to_9=False, overwrite_cache=True)
en_tn_model = EnNormalizer(overwrite_cache=True)
print("中文 TN (去除儿化音,重新在线构图):\n\t{} => {}".format(zh_tn_text, zh_tn_model.normalize(zh_tn_text)))
print("中文ITN (小于10的单独数字不转换,重新在线构图):\n\t{} => {}".format(zh_itn_text, zh_itn_model.normalize(zh_itn_text)))
print("英文 TN (暂时还没有可控的选项,后面会加...):\n\t{} => {}\n".format(en_tn_text, en_tn_model.normalize(en_tn_text)))

zh_tn_model = ZhNormalizer(overwrite_cache=False)
zh_itn_model = InverseNormalizer(overwrite_cache=False)
en_tn_model = EnNormalizer(overwrite_cache=False)
print("中文 TN (复用之前编译好的图):\n\t{} => {}".format(zh_tn_text, zh_tn_model.normalize(zh_tn_text)))
print("中文ITN (复用之前编译好的图):\n\t{} => {}".format(zh_itn_text, zh_itn_model.normalize(zh_itn_text)))
print("英文 TN (复用之前编译好的图):\n\t{} => {}\n".format(en_tn_text, en_tn_model.normalize(en_tn_text)))

zh_tn_model = ZhNormalizer(remove_erhua=False, overwrite_cache=True)
zh_itn_model = InverseNormalizer(enable_0_to_9=True, overwrite_cache=True)
print("中文 TN (不去除儿化音,重新在线构图):\n\t{} => {}".format(zh_tn_text, zh_tn_model.normalize(zh_tn_text)))
print("中文ITN (小于10的单独数字也进行转换,重新在线构图):\n\t{} => {}\n".format(zh_itn_text, zh_itn_model.normalize(zh_itn_text)))


![image](https://github.com/wenet-e2e/WeTextProcessing/assets/13466943/3ff49959-5fbe-4ff7-b0d5-a1d0298f9a9a)

Minor changes
* [refactor] support building fst online by xingchensong in https://github.com/wenet-e2e/WeTextProcessing/pull/230
* [fix] remove redundant mapping in whitelist by xingchensong in https://github.com/wenet-e2e/WeTextProcessing/pull/231
* [tn] english tn, support range by xingchensong in https://github.com/wenet-e2e/WeTextProcessing/pull/233
* [fix] fix itn 三四十万 一万六七 by xingchensong in https://github.com/wenet-e2e/WeTextProcessing/pull/234
* [fix] fix itn 洞>0,拐>7 by xingchensong in https://github.com/wenet-e2e/WeTextProcessing/pull/235
* [fix] fix tn, remove useless mapping in english tn by xingchensong in https://github.com/wenet-e2e/WeTextProcessing/pull/236

**Full Changelog**: https://github.com/wenet-e2e/WeTextProcessing/compare/0.2.1...1.0.0

1.0.4

What's Changed
* [itn] whitelist 7x24小时 by weimeng23 in https://github.com/wenet-e2e/WeTextProcessing/pull/266
* [itn] fix 十三五 by xingchensong in https://github.com/wenet-e2e/WeTextProcessing/pull/267
* [install] upgrade pynini to 2.1.6 in setup.py by lingji-yidong in https://github.com/wenet-e2e/WeTextProcessing/pull/269
* [itn] fix 四s店 by xingchensong in https://github.com/wenet-e2e/WeTextProcessing/pull/270

New Contributors
* lingji-yidong made their first contribution in https://github.com/wenet-e2e/WeTextProcessing/pull/269

**Full Changelog**: https://github.com/wenet-e2e/WeTextProcessing/compare/1.0.3...1.0.4

1.0.3

What's Changed
* [tn] english, fix crash on "" by xingchensong in https://github.com/wenet-e2e/WeTextProcessing/pull/249
* [tn] english, fix <p> by xingchensong in https://github.com/wenet-e2e/WeTextProcessing/pull/251
* [install] upgrade pynini to 2.1.6 by xingchensong in https://github.com/wenet-e2e/WeTextProcessing/pull/252
* [tn] add whitelist for you're by xingchensong in https://github.com/wenet-e2e/WeTextProcessing/pull/253
* [doc] Update README.md by xingchensong in https://github.com/wenet-e2e/WeTextProcessing/pull/254
* [itn] fix issue237, digit + union("百", "千", "万") + digit + unit by weimeng23 in https://github.com/wenet-e2e/WeTextProcessing/pull/255
* [tn] delete prefix space by xingchensong in https://github.com/wenet-e2e/WeTextProcessing/pull/262
* [itn] add whitelist by xingchensong in https://github.com/wenet-e2e/WeTextProcessing/pull/263


**Full Changelog**: https://github.com/wenet-e2e/WeTextProcessing/compare/1.0.2...1.0.3

1.0.2

What's Changed
* [fix] tn chinese, add punc by xingchensong in https://github.com/wenet-e2e/WeTextProcessing/pull/242
* [tn] chinese, append traditional_to_simple by xingchensong in https://github.com/wenet-e2e/WeTextProcessing/pull/243
* [itn] fix 八百一千=>800 1000 二十一千=>20 1000, 零千 零万 by weimeng23 in https://github.com/wenet-e2e/WeTextProcessing/pull/246


**Full Changelog**: https://github.com/wenet-e2e/WeTextProcessing/compare/1.0.1...1.0.2

1.0.1

What's Changed
* [fix] fix tn, week range by xingchensong in https://github.com/wenet-e2e/WeTextProcessing/pull/238
* [fix] fix tn, punct with space by xingchensong in https://github.com/wenet-e2e/WeTextProcessing/pull/239
* [fix] fix tn, remove useless mapping in whitelist by xingchensong in https://github.com/wenet-e2e/WeTextProcessing/pull/240
* [wheel] disable global logging config by xingchensong in https://github.com/wenet-e2e/WeTextProcessing/pull/241 (取消全局日志配置,避免覆盖其他程序的日志等级)


**Full Changelog**: https://github.com/wenet-e2e/WeTextProcessing/compare/1.0.0...1.0.1

1.0.0

Breaking Changes
1. support english tn, see https://github.com/wenet-e2e/WeTextProcessing/issues/202 , Most of the english rules were copied from NeMo, but the difference is that we made a significant simplification of the rules, those changes result in
* FST size comparison: 76M (NeMo) vs. 7M (Ours)
* Building time comparison (when you want to develop new rules): 777s (NeMo) vs. 41s (Ours)

![nemo](https://github.com/wenet-e2e/WeTextProcessing/assets/13466943/56ff15c8-b389-445b-ad6d-512c0bf8cfb3) | ![wetext](https://github.com/wenet-e2e/WeTextProcessing/assets/13466943/edcaa7e0-b543-429c-92c5-a9c13df0249a)
---|---
NeMo | WeText

2. support online building of fst, enjoy wetext without pain https://github.com/wenet-e2e/WeTextProcessing/pull/230

sh
pip install wetextprocessing


py
from itn.chinese.inverse_normalizer import InverseNormalizer
from tn.chinese.normalizer import Normalizer as ZhNormalizer
from tn.english.normalizer import Normalizer as EnNormalizer

Page 1 of 4

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.