Miditok

Latest version: v3.0.5.post1

Safety actively analyzes 723158 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 11

3.0.5.post1

What's Changed

* Fixing build target + using hatch in github actions by Natooz in https://github.com/Natooz/MidiTok/pull/223

3.0.5

What's Changed

* fix import HfHubHTTPError with latest hf hub package update by Natooz in https://github.com/Natooz/MidiTok/pull/199
* MDTK_200 : implemeted add_trailing_bars by Mintas in https://github.com/Natooz/MidiTok/pull/204
* Remove refs to split_midis_for_training in doc by Zaka in https://github.com/Natooz/MidiTok/pull/205
* Catching exception when decoding velocity values in MIDILike by Natooz in https://github.com/Natooz/MidiTok/pull/210
* Update example notebook reference by emmanuel-ferdman in https://github.com/Natooz/MidiTok/pull/216
* bugfix training initial alphabet by Natooz in https://github.com/Natooz/MidiTok/pull/220
* Add a parameter augment_copy to the augment_score function by pstrepetov in https://github.com/Natooz/MidiTok/pull/221

New Contributors

* Mintas made their first contribution in https://github.com/Natooz/MidiTok/pull/204
* Zaka made their first contribution in https://github.com/Natooz/MidiTok/pull/205
* emmanuel-ferdman made their first contribution in https://github.com/Natooz/MidiTok/pull/216
* pstrepetov made their first contribution in https://github.com/Natooz/MidiTok/pull/221

**Full Changelog**: https://github.com/Natooz/MidiTok/compare/v3.0.4...v3.0.5

3.0.4

This release introduces the `PerTok` tokenizer by Lemonaide AI, attribute controls tokens and minor fixes.

Highlights

PerTok: Performance Tokenizer

(associated paper to be released)

Developed by Julian Lenz (JLenzy) at [Lemonaide AI](https://www.lemonaide.ai) to capture expressive timing in symbolic scores while maintaining competitively low sequence lengths. It achieves this by dividing time differences into Macro and Micro categories, introducing a new MicroTime token type. Subtle deviations from the quantized beat are represented with these Timeshift tokens.
Furthermore, PerTok enables you to encode an unlimited number of note subdivisions by enabling multiple, overlapping values within the 'beat_res' parameter of the `TokenizerConfig`.

The micro timing tokens will be extended to all tokenizers in a future update.

Attribute Control tokens

Attribute controls are additional tokens allowing to train models in order to control them during inference, by enforcing a model to predict music with specific features.

What's Changed
* updates to Example_HuggingFace_Mistral_Transformer.ipynb by briane412 in https://github.com/Natooz/MidiTok/pull/164
* `_model_name` is now a protected property by Natooz in https://github.com/Natooz/MidiTok/pull/165
* Fixing docs for tokenizer training by Natooz in https://github.com/Natooz/MidiTok/pull/167
* Default `continuing_subword_prefix` when splitting token sequences by Natooz in https://github.com/Natooz/MidiTok/pull/168
* small bug fix in MIDI pretokenization by shenranwang in https://github.com/Natooz/MidiTok/pull/170
* adding `no_preprocess_score` argument when tokenizing by Natooz in https://github.com/Natooz/MidiTok/pull/172
* `TokSequence` summable, `concatenate_track_sequences` arg for MMM by Natooz in https://github.com/Natooz/MidiTok/pull/173
* Docs update by Natooz in https://github.com/Natooz/MidiTok/pull/175
* Fixing split methods for empty files (no tracks and/or no notes) by Natooz in https://github.com/Natooz/MidiTok/pull/177
* Logo now with white outer stroke by Natooz in https://github.com/Natooz/MidiTok/pull/180
* Attribute controls feature by helloWorld199 in https://github.com/Natooz/MidiTok/pull/181
* better distinction between `one_token_stream` and `config.one_token_stream_for_programs` by Natooz in https://github.com/Natooz/MidiTok/pull/182
* making sure MMM token sequences are not concatenated when splitting them per bar/beat in tokenizer_training_iterator.py by Natooz in https://github.com/Natooz/MidiTok/pull/183
* rST Documentation fixes by scottclowe in https://github.com/Natooz/MidiTok/pull/184
* Bump actions/stale from 5.1.1 to 9.0.0 by dependabot in https://github.com/Natooz/MidiTok/pull/185
* Bump actions/download-artifact from 3 to 4 by dependabot in https://github.com/Natooz/MidiTok/pull/186
* Bump codecov/codecov-action from 3.1.0 to 4.5.0 by dependabot in https://github.com/Natooz/MidiTok/pull/187
* Bump actions/upload-artifact from 3 to 4 by dependabot in https://github.com/Natooz/MidiTok/pull/188
* Fixing bugs caused by changes from symusic v0.5.0 by Natooz in https://github.com/Natooz/MidiTok/pull/192
* `use_velocities` and `use_duration` configuration parameters by Natooz in https://github.com/Natooz/MidiTok/pull/193
* collator now handles decoder input ids (seq2seq models) by Natooz in https://github.com/Natooz/MidiTok/pull/194
* PerTok Tokenizer by JLenzy in https://github.com/Natooz/MidiTok/pull/191

New Contributors
* briane412 made their first contribution in https://github.com/Natooz/MidiTok/pull/164
* helloWorld199 made their first contribution in https://github.com/Natooz/MidiTok/pull/181
* scottclowe made their first contribution in https://github.com/Natooz/MidiTok/pull/184
* dependabot made their first contribution in https://github.com/Natooz/MidiTok/pull/185

**Full Changelog**: https://github.com/Natooz/MidiTok/compare/v3.0.3...v3.0.4

3.0.3

Highlights

* Support for abc files, which can be loaded and dumped with symusic similarly to MIDI files;
* The tokenizers can now also be trained with the **WordPiece** and **Unigram** algorithms!
* Tokenizer training and token ids encoding can now be performed "bar-wise" or "beat-wise", meaning the tokenizer can learn new tokens from successions of base tokens strictly within bars or beats. This is set by the `encode_ids_split` attribute of the tokenizer config;
* [symusic](https://github.com/Yikai-Liao/symusic) v0.4.3 or higher is now required to comply with the usage of the `clip` method;
* Better handling of file loading errors in `DatasetMIDI` and `DataCollator`;
* Introducing a new `filter_dataset` to clean a dataset of MIDI/abc files before using it;
* `MMM` tokenizer has been cleaned up, and is now fully modular: it now works on top of other tokenizations (`REMI`, `TSD` and `MIDILike`) to allow more flexibility and interoperability;
* `TokSequence` objects can now be sliced and concatenated (eg `seq3 = seq1[:50] + seq2[50:]`);
* `TokSequence` objects tokenized from a tokenizer can now be split per bars or beats subsequences;
* minor fixes, code improvements and cleaning;

Methods renaming

A few methods and properties were previously named after "bpe" and "midi". To align with the more general usages of these methods (support for several file formats and training algorithms), they have been renamed with more idiomatic and accurate names.

<details>
<summary>Methods renamed with depreciation warning:</summary>

* `midi_to_tokens` --> `encode`;
* `tokens_to_midi` --> `decode`;
* `learn_bpe` --> `train`;
* `apply_bpe` --> `encode_token_ids`;
* `decode_bpe` --> `decode_token_ids`;
* `ids_bpe_encoded` --> `are_ids_encoded`;
* `vocab_bpe` --> `vocab_model`.
* `tokenize_midi_dataset` --> `tokenize_dataset`;
</details>

<details>
<summary>Methods renamed without depreciation warning (less usages, reduces the code messiness):</summary>

* `MIDITokenizer` --> `MusicTokenizer`;
* `augment_midi` --> `augment_score`;
* `augment_midi_dataset` --> `augment_dataset `;
* `augment_midi_multiple_offsets` --> `augment_score_multiple_offsets`;
* `split_midis_for_training` --> `split_files_for_training`;
* `split_midi_per_note_density` --> `split_score_per_note_density`;
* `get_midi_programs` --> `get_score_programs`;
* `merge_midis` --> `merge_scores`;
* `get_midi_ticks_per_beat` --> `get_score_ticks_per_beat`;
* `split_midi_per_ticks` --> `split_score_per_ticks`;
* `split_midi_per_beats` --> `split_score_per_beats`;
* `split_midi_per_tracks` --> `split_score_per_tracks`;
* `concat_midis` --> `concat_scores`;
</details>

<details>
<summary>Protected internal methods (no depreciation warning, advanced usages):</summary>

* `MIDITokenizer._tokens_to_midi` --> `MusicTokenizer._tokens_to_score`;
* `MIDITokenizer._midi_to_tokens` --> `MusicTokenizer._score_to_tokens`;
* `MIDITokenizer._create_midi_events` --> `MusicTokenizer._create_global_events`
</details>

There is no other compatibility issue beside these renaming.

**Full Changelog**: https://github.com/Natooz/MidiTok/compare/v3.0.2...v3.0.3

3.0.2

Tldr

This new version introduces a new `DatasetMIDI` class to use when training PyTorch models. It relies on the previously named `DatasetTok` class, with pre-tokenizing option and better handling of BOS and EOS tokens.
A new `miditok.pytorch_data.split_midis_for_training` method allows to dynamically chunk MIDIs into smaller parts that make approximately the desire token sequence length, based on the note densities of their bars. These chunks can be used to train a model while maximizing the overall amount of data used.
A few new utils methods have been created for this features, e.g. to split, concat or merge `symusic.Score` objects.
Thanks Kinyugo for the discussions and tests that guided the development of the features! (147)

The update also brings a few minor fixes, and the [docs](https://miditok.readthedocs.io/) have a new theme!

What's Changed

* Fix token_paths to files_paths, and config to model_config by sunsetsobserver in https://github.com/Natooz/MidiTok/pull/145
* Fix issues in Octuple with multiple different-beat time signatures by ilya16 in https://github.com/Natooz/MidiTok/pull/146
* Pitch interval decoding: discarding notes outside the tokenizer pitch range by Natooz in https://github.com/Natooz/MidiTok/pull/149
* Fixing `save_pretrained` to comply with huggingface_hub v0.21 by Natooz in https://github.com/Natooz/MidiTok/pull/150
* ability to `overwrite _create_durations_tuples` in init by JLenzy in https://github.com/Natooz/MidiTok/pull/153
* Refactor of PyTorch data loading classes and methods by Natooz and Kinyugo in https://github.com/Natooz/MidiTok/pull/148
* The docs have a new theme! Using the [furo](https://github.com/pradyunsg/furo) theme.

New Contributors
* sunsetsobserver made their first contribution in https://github.com/Natooz/MidiTok/pull/145
* JLenzy made their first contribution in https://github.com/Natooz/MidiTok/pull/153

**Full Changelog**: https://github.com/Natooz/MidiTok/compare/v3.0.1...v3.0.2

3.0.1

What's Changed
* `use_pitchdrum_tokens` option to use dedicated `PitchDrum` tokens for drums tracks
* Fixing time signature preprocessing (time division mismatch) in https://github.com/Natooz/MidiTok/pull/132 (#131 EterDelta)
* Fixing data augmentation example and considering all midi extensions in https://github.com/Natooz/MidiTok/pull/136 (#135 oiabtt)
* decoding: automatically making sure to decode BPE then completing `tokens` in https://github.com/Natooz/MidiTok/pull/138 (#137 oiabtt)
* `load_tokens` now returning `TokSequence` by in https://github.com/Natooz/MidiTok/pull/139 (#137 oiabtt)
* convert chord maps back to tuples from list when loading tokenizer from a saved configuration by shenranwang in https://github.com/Natooz/MidiTok/pull/141
* can now use `MIDITokenizer.from_pretrained` similarly to the `AutoTokenizer` in the Hugging Face transformers library by in https://github.com/Natooz/MidiTok/pull/142 (discussed in #127 oiabtt)

New Contributors
* shenranwang made their first contribution in https://github.com/Natooz/MidiTok/pull/141

**Full Changelog**: https://github.com/Natooz/MidiTok/compare/v3.0.0...v3.0.1

Page 1 of 11

Releases

Has known vulnerabilities

Miditok

Page 1 of 11

3.0.5.post1

3.0.5

3.0.4

3.0.3

3.0.2

3.0.1

Page 1 of 11

Links

Releases