Miditok

Latest version: v3.0.4

Safety actively analyzes 706267 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 6 of 11

1.2.8

Changes
* 82b2a1b16283f5191a40d42d81339f0a00d01ff4 Fix in `MuMIDI` `token_types_errors()`
* 0869c23a2abf2c4bc622d49acf79a5aa104f1621 Fix, BPE tokenizers now update the vocabulary `_token_types_indexes` attribute after being modified
* b3642c1bf02b435e9dda990a0fae5e792dcab7a7 `EOS` key added to `token_types_graph`, prevents crash just in case
* 7d873ca196344581803fd228112e058f7e1e471b MIDI objects converted from tokens now have `max_tick` attribute calculated
* 770d8b837e8ff0e781d621e5d503efc1ccd4db02 0869c23a2abf2c4bc622d49acf79a5aa104f1621 small fixes and typo corrections
* Fixes in tests and GitHub Action integration

Compatibility
* All good !

1.2.7

Changes
* 22fee1dba39630f80d0bc4341ce03bd9328b9692 TimeSignature parameter automatically set to False for incompatible tokenizers, also fixing a bug when it was not provided by the user
* 2e958f1dd761c5eb8ab115c3f2ebe627b010be5a TimeSignature of MIDI set to 4/4 if the original MIDI had none (rare but can happen)
* a46fd561d264cfe152088a616cbb94d9024592e9 unused import removed
* f416ff527198b378d5f5032eb0725ce0a61fe6ce BPE calculation in `apply_bpe` method speed up by precomputing token successions in a class attribute

Compatibility
* All good !

1.2.6

Changes
* 168c8c32230e1a3b714a6ab844b2c6e1825ad0c9 Bugfix in Octuple vocabulary creation, now only creates the selected programs
* bfe987e967a3704a3a5f50538a9134fe95181392 fix in **MuMIDI** and **Octuple** `token_types_errors` methods that could make crash when analyzing special tokens (Pas, Mask ...)
* 956738765147d7935088eb9e3e55bc8a4ab37271 bugfix in **CPWord** decoding (crash with special tokens), and **Octuple** now saves `_sos_eos ` and `_mask` attributes in `save_params `

Compatibility
* All good !

1.2.5

Changes
* 67c2926542528913ce820698a874d7324517d890 Introducing **TSD** tokenization (Time Shift Duration). It is similar to **MIDI-Like** but uses `Duration` tokens instead of `Note-Off`, and its main difference with **REMI** is the way it represents time.
* 8af6a6b074a5c38cb5fd68598a21732df1a805a7 `_add_pad_type_to_graph` method has been renamed `_add_special_tokens_to_types_graph`, and now also adds `SOS`, `EOS`, and `MASK` tokens to the graph.
* f755c70036af68c2faa62b408061bcd7fee94f06 and 4b069a2fe4887aa105f969e9e39d9cf66cb5092b `add_bpe_to_tokens_type_graph` method for byte pair encoding, fixing a bug when loading a tokenizer from config file.

Compatibility
* _add_pad_type_to_graph is still supported but will be removed in a future update, you should replace it by _add_special_tokens_to_types_graph in your code to stay up to date

1.2.4

Changes
* **[Byte Pair Encoding](https://en.wikipedia.org/wiki/Byte_pair_encoding)** is up ! it works with any tokenizer (except multi-embedding like CP Word or Octuple) as a wrapper to use as bpe(tokenizer_class, params) (see example in readme)
* 72a0f323c9234d3886f9c4867c44fa270d09db84 **Vocabulary** class now have a update_token_types_indexes method to create its _token_types_indexes attribute, which can be called after loading a tokenizer with its vocabulary saved (as with BPE)
* d232f4abb4ca6ceb388a8e04062e1fae98873f98 **Structured** now takes additional_tokens as constructor argument, to aligning with all other tokenizers
* 4b0dc9ff38021a375213149c005d4f2971a6ea0a Bugfix in **MIDITokenizer** base class for rest and beat range attributes when loading class from params
* eb3612fd194e1eae960756f78a6eae80f2e44e67 save_tokens now saves tokens as a dictionary with *tokens* and *programs* keys so that the distinction is clear
* tqdm is now used (and required) in tokenize_dataset and bpe methods

Compatibility
* Structured now takes additional_tokens as constructor argument, to aligning with all other tokenizers
* As from v1.2.4, tokens saved with the save_tokens method will now be saved as a dictionary, so that no confusion is made between tracks and programs (as it could before). You can still load tokens saved with < v1.2.4 with load_tokens with no consequences, as you then handle how to index from it.

1.2.3

Changes
* 87db4802bb70e056741154bdb58c993c88527868 fix in merge_tracks_per_class, some tracks were omitted when filtering pitch / tessitura

Page 6 of 11

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.