Version 1.3.0 introduces several key updates to enhance functionality, usability, and reliability. This release focuses on refining encoding handling, adding useful CLI features, and improving user experience with visual progress feedback.
Key Updates:
- **Progress Bars**: Implemented using the `rich` library, progress bars provide clear, real-time feedback for longer operations, making processes easier to track.
- **Quiet Mode**: A new CLI option to suppress unnecessary output for a streamlined experience.
Bug Fix:
A critical issue with character handling has been resolved. Previously, files were read in their correct encoding, but `tiktoken` expects UTF-8 input. This mismatch caused problems with special characters like `é` and differences between straight and typographic apostrophes (`'` vs. `’`), leading to errors and inconsistent replacements. The input handling has been updated to ensure consistent UTF-8 processing, eliminating these issues.
Updated Methods:
Several methods have been introduced or updated to improve flexibility when working with encodings and models:
- `GetEncodingNameForModel`: Returns the encoding name as a string for a specified model.
- `GetEncodingForModel`: Outputs the `tiktoken.Encoding` object for a given model.
- `GetModelForEncodingName`: Maps encoding names to their corresponding model names.
- `GetModelForEncoding`: Maps a `tiktoken.Encoding` object to its model name.
New CLI Commands:
- `get-model`: Retrieve the model name from a given encoding.
- `get-encoding`: Retrieve the encoding name from a given model.
Testing Framework:
A comprehensive testing suite has been introduced, ensuring greater reliability and robustness for all features.
Version 1.3.0 represents a substantial improvement, addressing critical issues while introducing new tools and enhancements. For detailed information, refer to the [documentation](https://github.com/kgruiz/PyTokenCounter#readme).