- Add `decode_batch` and `decode_bytes_batch` - Improve error messages and handling
0.3.3
- `tiktoken` will now make a best effort attempt to replace surrogate pairs with the corresponding Unicode character and will replace lone surrogates with the Unicode replacement character.
0.3.2
- Add encoding for GPT-4
0.3.1
- Build aarch64 wheels - Make `blobfile` an optional dependency
Thank you to messense for the environment variable that makes cargo not OOM under emulation!
0.3.0
- Improve performance by 5-20%; thank you to nistath! - Add `gpt-3.5-turbo` models to `encoding_for_model` - Add prefix matching to `encoding_for_model` to better support future model versions - Fix a bug in the README instructions on extending tiktoken - Update the set of available encodings - Add packaging metadata
0.2.0
- Add `tiktoken.encoding_for_model` to get the encoding for a specific model - Improve portability of caching logic
Thank you to fritzo, arvid220u, khanhvu207, henriktorget for various small corrections