Tensorflow-text

Latest version: v2.16.1

Safety actively analyzes 638388 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 7 of 10

2.4.0b0

Please note that this is a pre-release and meant to run with TF v2.3.x. We wanted to give access to some of the features we were adding to 2.4.x, but did not want to wait for the TF release.

Major Features and Improvements

* Released our first TF Hub module for Chinese segmentation! Please visit the hub module page [here](https://tfhub.dev/google/zh_segmentation/1) for more info including instructions on how to use the model.
* Added `Spliter` / `SplitterWithOffsets` abstract base classes. These are meant to replace the current `Tokenizer` / `TokenizerWithOffsets` base classes. The `Tokenizer` base classes will continue to work and will implement these new `Splitter` base classes. The reasoning behind the change is to prevent confusion when future splitting operations that also use this interface do not tokenize into words (sentences, subwords, etc).
* With this cleanup of terminology, we've also updated the documentation and internal variable names for token offsets to use "end" instead of "limit". This is purely a documentation change and doesn't affect any current APIs, but we feel it more clearly expresses that `offset_end` is a positional value rather than a length.
* Added new `HubModuleSplitter` that helps handle ragged tensor input and outputs for hub modules which implement the Splitter class.
* Added new `SplitMergeFromLogitsTokenizer` which is a narrowly focused tokenizer that splits text based on logits from a model. This is used with the newly released Chinese segmentation model.

Bug Fixes and Other Changes

* Test cleanup - use assertAllEqual(expected, actual), instead of (actual, expected), for better error messages.
* Add dep on tensorflow_hub in pip_package/setup.py
* Add filegroup BUILD target for test_data segmentation Hub module.
* Extend documentation for class HubModuleSplitter.
* Read SP model file in bytes mode in tests.

Thanks to our Contributors

2.3.0

Major Features and Improvements

* Added UnicodeCharacterTokenizer
* Tokenizers are now tf.Modules and can be saved from within Keras layers.

Bug Fixes and Other Changes

* Allow wordpiece_tokenizer to output int32 tokens natively.
* Tracks the Sentencepiece model resource via a TrackableResource.
* oss-segmenter:
* fix end-offset error in split_merge_tokenizer_kernel.
* TensorFlow text python ops wordshape:
* More comprehensive emoji handling
* Other:
* Unref lookup_table in wordpiece_kernel fixing a possible memory leak.
* Add missing LICENSE file for third_party/tensorflow_text/core/kernels.
* add normalize kernals test
* Fix Sentencepiece tests.
* Add some metric logs to tokenizers.
* Fix documentation formatting for SplitMergeTokenizer
* Bug fix: make sure tokenize() method does not ignore itself.
* Improve logging efficiency.
* Update tf.text's regression test model for model server. Without the asserts, errors are erroneously swallowed by tensorflow. I also added tf.unicode_script test just to ensure that ICU is working correctly from within model server.
* Add the ability to define a user-defined destination directory to make testing easier.
* Fix typo in documentation of BertTokenizer
* Clarify docstring of UnicodeScriptTokenizer about splitting on space
* Add executable flag to the run_build.sh script.
* Clarify docstring of WordpieceTokenizer on unknown_token:
* Update protobuf library and point HEAD to build on tf 2.3.0-rc0

Thanks to our Contributors

2.3.0rc1

Major Features and Improvements

* Added UnicodeCharacterTokenizer

Bug Fixes and Other Changes

* oss-segmenter:
* fix end-offset error in split_merge_tokenizer_kernel.
* TensorFlow text python ops wordshape:
* More comprehensive emoji handling
* Other:
* Unref lookup_table in wordpiece_kernel fixing a possible memory leak.
* Add missing LICENSE file for third_party/tensorflow_text/core/kernels.
* add normalize kernals test
* Add some metric logs to tokenizers.
* Fix documentation formatting for SplitMergeTokenizer
* Bug fix: make sure tokenize() method does not ignore itself.
* Improve logging efficiency.
* Update tf.text's regression test model for model server. Without the asserts, errors are erroneously swallowed by tensorflow. I also added tf.unicode_script test just to ensure that ICU is working correctly from within model server.
* Add the ability to define a user-defined destination directory to make testing easier.
* Fix typo in documentation of BertTokenizer
* Clarify docstring of UnicodeScriptTokenizer about splitting on space
* Add executable flag to the run_build.sh script.
* Clarify docstring of WordpieceTokenizer on unknown_token:
* Update protobuf library and point HEAD to build on tf 2.3.0-rc0

Thanks to our Contributors

2.2.1

2.2

Major Features and Improvements

Breaking Changes

Bug Fixes and Other Changes

* Update version

Thanks to our Contributors

2.2.0

Page 7 of 10

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.