Miditok

Latest version: v3.0.4

Safety actively analyzes 681866 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 5 of 11

1.4.0

This pretty big update brings data augmentation, some bug fixes and optimizations, allowing to write more elegant code.

Changes
* 8f201e0299b503ce2b6976cbfa1d39660b4c3efe 308fb278dd1df2dee79541b8c85773a619ef5b02 [**Data augmentation**](https://github.com/Natooz/MidiTok/tree/main/miditok/data_augmentation) methods ! 🙌 They can be applied on both MIDI and tokens, to augment data by shifting the pitch, velocity and duration values.
* 1d8e9034c9354a60249de05dc938eb8e880f366e You can perform data augmentation while tokenizing a dataset (`tokenize_midi_dataset` method) with the `data_augment_offsets` argument. This will be done at the token level, as its faster than augmenting MIDI objects.
* 0634adee1f050fb51eed1d73ef39f982573c5d7d **BPE** is now implemented in the main tokenizer class! This means all tokenizers can benefit form it in a much prettier way!
* 0634adee1f050fb51eed1d73ef39f982573c5d7d **`bpe` method renamed to `learn_bpe`**, and now returns metrics (that are also showed in the progress bar during the learning) on the number of token combinations and sequence length reduction
* 7b8c9777cb0866a179b64e50c26c6c7cccec5cee Retrocompatibility when loading tokenizer config files with BPE from older versions
* 3cea9aa11c238486a71dff82d244a0f16a8a52e9 nturusin Example notebook of GPT2 Hugging Face music transformer: fixes in training
* 65afa6b1aaa35e0df396276f9811f12d10f67ea6 The `tokens_to_midi` and `save_tokens` methods **can now receive tokens as Tensors and numpy arrays. PyTorch, TensorFlow and Jax (numpy) tensors are supported**. The `convert_tokens_tensors_to_list` decorator will convert them to lists, you can use it on your custom methods.
* aab64aa4159ee27022b359597ece3154dc224513 The `__call__` magic method now automatically route to `midi_to_tokens` or `tokens_to_midi` following what you give it. You can now use more elegantly tokenizers as `tokenizer(midi_obj)` or `tokenizer(generated_tokens)`.
* e90b20a86283aa1dab2db071f5c6a49b161caa42 Bugfix in `Structured` causing a possible infinite while loop with illegal token types successions
* 947af8cfa0c72212a8835e10fe1f804356d13e8a Big refactor of MuMIDI, which have now fixed vocab / type idx. It is easier to handle and use. (thanks gonzaloarca)
* 947af8cfa0c72212a8835e10fe1f804356d13e8a CPWord "Ignore" tokens are all renamed `Ignore_None` by convention, making operations easier in data augmentation and other methods.

Compatibility
* code with BPE would have to updated: remove `bpe(tokenizer)` and just declare tokenizers normally, rename the `bpe` method to `learn_bpe`
* MuMIDI tokens and tokenizers will be incompatible with v1.4.0

1.3.3

Changes
* 4f4e49ef4ebd00d98f0b0c02b3689cfc9bba122b Magic method `len` bugfix with multi-vocal tokenizers, `len` is now also a property
* 925c7ae2eb60b8c64066bf41c974292db56cbac8 & 5b4f4102691e14aa07b02d2f8c9eb34a484b5221 Bugfix of token types initialization when loading tokenizer from params file
* c873456ad0e3017239a55770daed7b00f39d11cb Removed hyphens from token types names, for better visibility. Be convention tokens types are all written in CamelCase.
* 5e51e843126af06f2baa5e9ad64fb4c634f787cb New `multi_voc` property
* b3b0cc7c6f1f8cca453d5d50f76c459c49d5c910 `tokenize_dataset`, progress bar now show the saving directory name

Compatibility
* All good 🙌

1.3.2

Changes
* Fansesi - f92f4aa98c4e407de9ca1d925e47b44833050aba Corrects a bug when using `tokenize_dataset` with `out_dir` as non-`Path` object (issue 18)
* 27240627e2f65225e4fc398edb5b212cba8f18de Bugfix when using `files_lim` with `bpe`

Compatibility
* All good 🙌

1.3.1

Highlights
This versions uniformly cleans how the `save_params` is called, brings related minor fixes and new features.

Changes
* 3c4adf808c244fcb95f0da476227a588fccf01c6 Tokenizers now take a `unique_track` argument at creation. This parameter specifies if the tokenizer represents and handles music as a single track, or stream of tokens. This is the case of Octuple and MuMIDI, and probably most representations that natively support multitrack music. **If given True, the tokens will be saved in json files as a single track. This parameter can then help when loading tokenized datasets.**
* 3c4adf808c244fcb95f0da476227a588fccf01c6 `save_params` method: `out_dir` argument renamed to `out_path`
* 3c4adf808c244fcb95f0da476227a588fccf01c6 `save_params` method: `out_path` can now specify the full path and name of the config file saved
* 3c4adf808c244fcb95f0da476227a588fccf01c6 fixes in `save_params` method for MuMIDI
* 3c4adf808c244fcb95f0da476227a588fccf01c6 The current version number is fixed (was 1.2.9 instead of 1.3.0 for v1.3.0)
* 4be897bbdf8b84e74c5230449d28ed5dd7f1b8d5 `bpe` method (learning BPE vocabulary) now has a `print_seq_len_variation` argument, to optionally print the mean sequence length before and after BPE, and the variation in %. (default: True)

Compatibility
* You might need to update your code when:
* * creating a tokenizer, to handle the new `unique_track` argument.
* * saving a tokenizer's config to handle the `out_dir` argument renamed to `out_path`
* For datasets tokenized with BPE will need to change the `token_to_event` key to `vocab` in the associated tokenizer configuration file

1.3.0

Highlight

Version 1.3.0 changes the way the vocabulary, and by extension tokenizers, handle special tokens: `PAD`, `SOS`, `EOS` and `MASK`. It brings a cleaner way to instantiate these classes.
It might bring incompatibilities with data and models used with previous MidiTok versions.

Changes
* b9218bf7cba9695f37f74424325037bb4dda1cb7 `Vocabulary` class now takes `pad` argument to specify to include special padding token. This option is set to True by default, as it is more common to train networks with batches of unequal sequence lengths.
* b9218bf7cba9695f37f74424325037bb4dda1cb7 `Vocabulary` class: the `event_to_token` argument of the constructor is renamed `events` and has to be given as a list of events.
* b9218bf7cba9695f37f74424325037bb4dda1cb7 `Vocabulary` class: when adding a token to the vocabulary, the index is automatically set. The index argument is removed as it could cause issues / confusion when mapping indexes with models.
* b9218bf7cba9695f37f74424325037bb4dda1cb7 The `Event` class now takes `value` argument in second (order)
* b9218bf7cba9695f37f74424325037bb4dda1cb7 Fix when learning BPE if `files_lim` were higher than the number of files itself
* f9cb1098df1334fb618550ea5c946657aea05881 For all tokenizers, a new constructor argument `pad` specifies to use padding token, and the `sos_eos_tokens` argument is renamed to `sos_eos`
* f9cb1098df1334fb618550ea5c946657aea05881 When creating a `Vocabulary`, the *SOS* and *EOS* tokens are now registered before the *MASK* token. This change is motivated so the order matches the one of special token arguments in tokenizers constructors, and as the *SOS* and *EOS* tokens are more commonly used in symbolic music applications.
* 84db19d16efb3db566d468e6c71a912b1193ae05 The dummy `StructuredEncoding`, `MuMIDIEncoding`, `OctupleEncoding` and `OctupleMonoEncoding` classes removed from `init.py`. These classes from early versions had no record of being used. Other dummy classes (REMI, MIDILike and CPWord) remain.

Compatibility
* You might need to update your code when creating your tokenizer to handle the new `pad` argument.
* Data tokenized with **REMI**, and models trained with, will be incompatible with v1.3.0 if you used special tokens. The *BAR* token was previously at index 1, and is now added after special tokens.
* If you created custom tokenizer inheriting `MIDITokenizer`, make sure to update the calls to `super().__init__` with new `pad` arg and renamed `sos_eos` arg (example for MIDILike: [f9cb109](https://github.com/Natooz/MidiTok/commit/f9cb1098df1334fb618550ea5c946657aea05881#diff-ee81c42cbf9a0e5150c860773b29d8d75edbe6774b95837bbcb09272d6408883))
* **If you used both *SOS/EOS* and *MASK* special tokens**, their order (indexes) is now swapped as *SOS/EOS* are now registered before *MASK*. As these tokens should are not used during the tokenization, **your previously tokenized datasets remain compatible**, unless you intentionally inserted *SOS*/*EOS*/*MASK* tokens. **Trained models will however be incompatible** as the indices are swapped. If you want to use v1.3.0 with a previously trained model, you can manually invert the predictions of these tokens.
* No incompatibilities outside of these cases

**Please reach out if you have any issue / question!** 🙌

1.2.9

Changes
* 212a9436bd223873cf535495bd545f21ad431fe5 **BPE**: Speed boost in `apply_bpe` method, about 1.5 times faster 🚀
* 4b8ccb9b8be25c5be46af65dbfb24763353791ed **BPE**: `tokens_to_events` method is not longer inplace
* be3e244ee5647b71f97055362ba597f45319260d `save_tokens` method now takes `**kwargs` arguments to save additional information in json files
* b690cabfed30733723e0ebc844750c129199b5d7 fix when computing `max_tick` attribute of a MIDI, when it have tracks with no notes
* f1855b6870c3e28fec687b7acd621a2f9be2aec6 MidiTok package version is now saved with tokenizer parameters. It allows to keep track of the version used.
* Lint and coverage improvements ✨

Compatibility
* If you explicitly used `tokens_to_events`, you might need to do an adaptation as it is no longer inplace.

Page 5 of 11

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.