Streaming `v0.7.5` is released! Install via `pip`:
pip install --upgrade mosaicml-streaming==0.7.5
:gem: New Features
1. Tensor/Sequence Parallelism Support
Using the `replication` argument, easily share data samples across multiple ranks, enabling sequence or tensor parallelism.
* Replicating samples across devices (SP / TP enablement) by knighton in https://github.com/mosaicml/streaming/pull/597
* Expanded replication testing + documentation by snarayan21 in https://github.com/mosaicml/streaming/pull/607
* Make streaming use the correct number of unique samples with SP/TP by snarayan21 in https://github.com/mosaicml/streaming/pull/619
2. Overhauled Streaming Documentation
New and improved streaming documentation can be found [here](https://docs.mosaicml.com/projects/streaming/en/stable/#) -- please submit issues with any feedback.
* Major overhaul of Streaming documentation by snarayan21 in https://github.com/mosaicml/streaming/pull/636
3. `batch_size` is now required for StreamingDataset
As we have seen multiple errors and performance degradations from users not setting the `batch_size` argument to StreamingDataset, we are making it a requirement to iterate over the dataset.
* You must set batch size. There is no other way. by snarayan21 in https://github.com/mosaicml/streaming/pull/624
3. Support for Python 3.11, deprecate Python 3.8
* Add support for Python 3.11 and deprecate Python 3.8 by karan6181 in https://github.com/mosaicml/streaming/pull/586
🐛 Bug Fixes
* [easy typo fix] fix f-string by bigning in https://github.com/mosaicml/streaming/pull/596
* Change comparison in partitions to include equals by JAEarly in https://github.com/mosaicml/streaming/pull/587
* Use type int when initializing SharedMemory size by bchiang2 in https://github.com/mosaicml/streaming/pull/604
* COCO Dataset fix -- avoids `allow_unsafe_types=True` by snarayan21 in https://github.com/mosaicml/streaming/pull/647
🔧 Improvements
* Allow writers to overwrite existing data by JAEarly in https://github.com/mosaicml/streaming/pull/594
* Update careers link by milocress in https://github.com/mosaicml/streaming/pull/611
* Update license by b-chu in https://github.com/mosaicml/streaming/pull/568
* Updated documentation for S3-compatible object stores by AIproj in https://github.com/mosaicml/streaming/pull/592
* Make yamllint consistent with Composer by b-chu in https://github.com/mosaicml/streaming/pull/583
* Switch linting workflows to ci-testing repo by b-chu in https://github.com/mosaicml/streaming/pull/616
What's Changed
* Bump uvicorn from 0.26.0 to 0.27.1 by dependabot in https://github.com/mosaicml/streaming/pull/599
* Bump pytest-split from 0.8.1 to 0.8.2 by dependabot in https://github.com/mosaicml/streaming/pull/581
* Update ruff to 0.2.2 by Skylion007 in https://github.com/mosaicml/streaming/pull/608
* Bump fastapi from 0.109.0 to 0.110.0 by dependabot in https://github.com/mosaicml/streaming/pull/610
* Bump yamllint from 1.33.0 to 1.35.1 by dependabot in https://github.com/mosaicml/streaming/pull/601
* Bump uvicorn from 0.27.1 to 0.28.0 by dependabot in https://github.com/mosaicml/streaming/pull/626
* Update moto requirement from <5,>=4.0 to >=4.0,<6 by dependabot in https://github.com/mosaicml/streaming/pull/580
* Bump furo from 2023.7.26 to 2024.1.29 by dependabot in https://github.com/mosaicml/streaming/pull/631
* Bump pypandoc from 1.12 to 1.13 by dependabot in https://github.com/mosaicml/streaming/pull/630
* Bump databricks-sdk from 0.14.0 to 0.22.0 by dependabot in https://github.com/mosaicml/streaming/pull/629
* Add batch_size to 1 if not provided for regression testing by karan6181 in https://github.com/mosaicml/streaming/pull/635
* Fixed docstring note for getting sequential sample ordering by snarayan21 in https://github.com/mosaicml/streaming/pull/632
* Bump pytest and fix failing test by snarayan21 in https://github.com/mosaicml/streaming/pull/642
* Update pytest-cov requirement from <5,>=4 to >=4,<6 by dependabot in https://github.com/mosaicml/streaming/pull/638
* Bump pydantic from 2.5.3 to 2.6.4 by dependabot in https://github.com/mosaicml/streaming/pull/639
* Bump uvicorn from 0.28.0 to 0.29.0 by dependabot in https://github.com/mosaicml/streaming/pull/640
* Bump databricks-sdk from 0.22.0 to 0.23.0 by dependabot in https://github.com/mosaicml/streaming/pull/644
* Version bump to 0.7.5 by snarayan21 in https://github.com/mosaicml/streaming/pull/650
New Contributors
* bigning made their first contribution in https://github.com/mosaicml/streaming/pull/596
* JAEarly made their first contribution in https://github.com/mosaicml/streaming/pull/587
* AIproj made their first contribution in https://github.com/mosaicml/streaming/pull/592
* milocress made their first contribution in https://github.com/mosaicml/streaming/pull/611
* bchiang2 made their first contribution in https://github.com/mosaicml/streaming/pull/604
**Full Changelog**: https://github.com/mosaicml/streaming/compare/v0.7.4...v0.7.5