New feature updates around data handling and preprocessing:
- Enable loading of Parquet and Arrow Dataset files.
- Dataset mixing via sampling probabilities in data config.
- New additional_data_handlers arg in train function to be registered with the data preprocessor.
- Support multiple files, directories, pattern-based paths, HF Dataset IDs, and their combinations via `data_config`.
- New support for both multi-turn and single-turn chat interactions.
New tracker:
- New MLFlow tracker
Additional Changes
- Refactor test artifacts into tests/artifacts , adding new data types, datasets, and predefined data configs for new unit tests.
- Resolve issues with deprecated training arguments.
Full list of Changes
* feat: Add support to handle Parquet Dataset files via data config by Abhishek-TAMU in https://github.com/foundation-model-stack/fms-hf-tuning/pull/401
* test: add arrow datasets and arrow unit tests by willmj in https://github.com/foundation-model-stack/fms-hf-tuning/pull/403
* feat: Perform dataset mixing via sampling probabilities in data config by dushyantbehl in https://github.com/foundation-model-stack/fms-hf-tuning/pull/408
* feat: Expose additional data handlers as an argument in train by dushyantbehl in https://github.com/foundation-model-stack/fms-hf-tuning/pull/409
* fix: Move deprecated positional arguments from SFTTrainer to SFTConfig by Luka-D in https://github.com/foundation-model-stack/fms-hf-tuning/pull/399
* fix: update dataclass objects directly instead of creating new variables by kmehant in https://github.com/foundation-model-stack/fms-hf-tuning/pull/418
* test: Add unit tests to test multiple files in single dataset by Abhishek-TAMU in https://github.com/foundation-model-stack/fms-hf-tuning/pull/412
* feat: Add multi and single turn chat support by dushyantbehl in https://github.com/foundation-model-stack/fms-hf-tuning/pull/415
* feat: Integrate MLflow tracker by dushyantbehl in https://github.com/foundation-model-stack/fms-hf-tuning/pull/425
* feat: Handle passing of multiple files, multiple folders, path with patterns, HF Dataset and combination by Abhishek-TAMU in https://github.com/foundation-model-stack/fms-hf-tuning/pull/424
* docs: Add documentation for data preprocessor release by dushyantbehl in https://github.com/foundation-model-stack/fms-hf-tuning/pull/423
New Contributors
* Luka-D made their first contribution in https://github.com/foundation-model-stack/fms-hf-tuning/pull/399
**Full Changelog**: https://github.com/foundation-model-stack/fms-hf-tuning/compare/v2.2.0...v2.3.1