Major Features and Improvements
* Added UnicodeCharacterTokenizer
* Tokenizers are now tf.Modules and can be saved from within Keras layers.
Bug Fixes and Other Changes
* Allow wordpiece_tokenizer to output int32 tokens natively.
* Tracks the Sentencepiece model resource via a TrackableResource.
* oss-segmenter:
* fix end-offset error in split_merge_tokenizer_kernel.
* TensorFlow text python ops wordshape:
* More comprehensive emoji handling
* Other:
* Unref lookup_table in wordpiece_kernel fixing a possible memory leak.
* Add missing LICENSE file for third_party/tensorflow_text/core/kernels.
* add normalize kernals test
* Fix Sentencepiece tests.
* Add some metric logs to tokenizers.
* Fix documentation formatting for SplitMergeTokenizer
* Bug fix: make sure tokenize() method does not ignore itself.
* Improve logging efficiency.
* Update tf.text's regression test model for model server. Without the asserts, errors are erroneously swallowed by tensorflow. I also added tf.unicode_script test just to ensure that ICU is working correctly from within model server.
* Add the ability to define a user-defined destination directory to make testing easier.
* Fix typo in documentation of BertTokenizer
* Clarify docstring of UnicodeScriptTokenizer about splitting on space
* Add executable flag to the run_build.sh script.
* Clarify docstring of WordpieceTokenizer on unknown_token:
* Update protobuf library and point HEAD to build on tf 2.3.0-rc0
Thanks to our Contributors