Mosaicml-streaming

Latest version: v0.11.0

Safety actively analyzes 708039 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 5

0.11.0

What's new
1. Introducing registry for customizable components (https://github.com/mosaicml/streaming/pull/858)
`StreamingDataset` can now be used with custom `Stream` implementations via a registry. See [the documentation page](https://docs.mosaicml.com/projects/streaming/en/stable/dataset_configuration/mixing_data_sources.html) for example usage.

🐛 Bug fixes
* Fix `simulation` module import paths (srstevenson)
* Fix `S3Downloader` serialization issues (wouterzwerink)

What's Changed
* Bound numpy version below 2.2.0 by snarayan21 in https://github.com/mosaicml/streaming/pull/849
* Fix import paths in `simulation` module by srstevenson in https://github.com/mosaicml/streaming/pull/838
* Prevent _s3_client from being serialized by wouterzwerink in https://github.com/mosaicml/streaming/pull/847
* Fix a few typos by srstevenson in https://github.com/mosaicml/streaming/pull/843
* Change broken user guide link to quick start by srstevenson in https://github.com/mosaicml/streaming/pull/841
* Remove unused import from quick start example by srstevenson in https://github.com/mosaicml/streaming/pull/842
* Change simulator UI help text to refer to directory by srstevenson in https://github.com/mosaicml/streaming/pull/839
* Bump fastapi from 0.115.5 to 0.115.6 by dependabot in https://github.com/mosaicml/streaming/pull/845
* Bump pydantic from 2.10.2 to 2.10.3 by dependabot in https://github.com/mosaicml/streaming/pull/846
* Update mosaicml-cli requirement from <0.7,>=0.5.25 to >=0.5.25,<0.8 by dependabot in https://github.com/mosaicml/streaming/pull/850
* Bump uvicorn from 0.32.1 to 0.34.0 by dependabot in https://github.com/mosaicml/streaming/pull/855
* Bump pydantic from 2.10.3 to 2.10.4 by dependabot in https://github.com/mosaicml/streaming/pull/856
* Update huggingface-hub requirement from <0.27,>=0.23.4 to >=0.23.4,<0.28 by dependabot in https://github.com/mosaicml/streaming/pull/859
* Set `epoch_seed_change` attribute on `SimulationDataset` by srstevenson in https://github.com/mosaicml/streaming/pull/840
* Use registry when creating Stream in StreamingDataset by es94129 in https://github.com/mosaicml/streaming/pull/858
* Bump pydantic from 2.10.4 to 2.10.5 by dependabot in https://github.com/mosaicml/streaming/pull/861

New Contributors
* srstevenson made their first contribution in https://github.com/mosaicml/streaming/pull/838
* wouterzwerink made their first contribution in https://github.com/mosaicml/streaming/pull/847
* es94129 made their first contribution in https://github.com/mosaicml/streaming/pull/858

**Full Changelog**: https://github.com/mosaicml/streaming/compare/v0.10.0...v0.11.0

0.10.0

Improvements

1. Reusable cloud download clients (https://github.com/mosaicml/streaming/pull/817)
* Streaming now reuses cloud download clients when downloading shard files instead of creating a new client for each download.
* This avoids run failures that sometimes occur with too many open sockets or excessive cloud authentication requests.

2: `py1b` shuffle algorithm deprecation (https://github.com/mosaicml/streaming/pull/837)
* The `py1b` shuffle algorithm has now been deprecated. Please use the improved `py1e` (default) or the `py1br` shuffle algorithms instead.

What's Changed
* Update FAQs to indicate wrapping not supported by milocress in https://github.com/mosaicml/streaming/pull/822
* refactored the download module to have reusable clients by ethantang-db in https://github.com/mosaicml/streaming/pull/817
* Update pytest-cov requirement from <6,>=4 to >=4,<7 by dependabot in https://github.com/mosaicml/streaming/pull/821
* Consistent errors for unused streams in batching methods by snarayan21 in https://github.com/mosaicml/streaming/pull/826
* Update setuptools requirement from <68.0.0 to <76.0.0 by dependabot in https://github.com/mosaicml/streaming/pull/825
* fix f string by XiaohanZhangCMU in https://github.com/mosaicml/streaming/pull/829
* Bump fastapi from 0.115.4 to 0.115.5 by dependabot in https://github.com/mosaicml/streaming/pull/830
* Bump uvicorn from 0.32.0 to 0.32.1 by dependabot in https://github.com/mosaicml/streaming/pull/834
* Bump pydantic from 2.9.2 to 2.10.1 by dependabot in https://github.com/mosaicml/streaming/pull/833
* Bump pytest from 8.3.3 to 8.3.4 by dependabot in https://github.com/mosaicml/streaming/pull/836
* Bump pydantic from 2.10.1 to 2.10.2 by dependabot in https://github.com/mosaicml/streaming/pull/835
* Version bump to 0.11.0.dev0, including deprecations by snarayan21 in https://github.com/mosaicml/streaming/pull/837

New Contributors
* ethantang-db made their first contribution in https://github.com/mosaicml/streaming/pull/817

**Full Changelog**: https://github.com/mosaicml/streaming/compare/v0.9.1...v0.10.0

0.9.1

What's New
1. Streaming is added to Gurubase (https://github.com/mosaicml/streaming/pull/805)
* Streaming now has an AI assistant available to help users with their questions! Try out Streaming Guru which uses the data from this repo and data from the [docs](https://docs.mosaicml.com/projects/streaming/en/stable/) to answer questions by leveraging the LLM.

Improvements
1. Permission Issue Resolution (https://github.com/mosaicml/streaming/pull/813)
* Resolved read permission issues occurring when shared memory files are created in shared computing environments. We added retry conditions to allow the creation of new shared memory files upon encountering permission errors.
* Prefix Integrity for Shared Memory Files: When creating shared memory files, both LOCALS and FILELOCKS are now validated to ensure no overlap with existing files, and they are matched with consistent prefix identifiers.
* Handling Non-Normal Program Exits: Enhanced cleanup procedures to address cases where non-normal program exits left some shared memory files uncleared. All files in SHM_TO_CLEAN are now checked to prevent duplicates.
These changes improve shared memory management and reliability in shared environments.

2. Fix Shard Eviction Hanging (https://github.com/mosaicml/streaming/pull/795)
* Changed the search for coldest shard to avoid looping over remote shards by considering local shards only as possible candidates for eviction.




What's Changed
* Bump pydantic from 2.9.1 to 2.9.2 by dependabot in https://github.com/mosaicml/streaming/pull/785
* Bump fastapi from 0.114.2 to 0.115.0 by dependabot in https://github.com/mosaicml/streaming/pull/786
* Bump uvicorn from 0.30.6 to 0.31.0 by dependabot in https://github.com/mosaicml/streaming/pull/793
* Fixed broken links in README.md by LukaszSztukiewicz in https://github.com/mosaicml/streaming/pull/794
* Shard evict fix by snarayan21 in https://github.com/mosaicml/streaming/pull/795
* Update huggingface-hub requirement from <0.25,>=0.23.4 to >=0.23.4,<0.26 by dependabot in https://github.com/mosaicml/streaming/pull/787
* Fix dataset.size() typo in docs by snarayan21 in https://github.com/mosaicml/streaming/pull/798
* Warning -> info about defaults from v0.7.0 by snarayan21 in https://github.com/mosaicml/streaming/pull/799
* Bump uvicorn from 0.31.0 to 0.31.1 by dependabot in https://github.com/mosaicml/streaming/pull/803
* Bump fastapi from 0.115.0 to 0.115.2 by dependabot in https://github.com/mosaicml/streaming/pull/804
* Introducing Streaming Guru on Gurubase.io by kursataktas in https://github.com/mosaicml/streaming/pull/805
* Add better error message for shared prefix by XiaohanZhangCMU in https://github.com/mosaicml/streaming/pull/806
* Bump uvicorn from 0.31.1 to 0.32.0 by dependabot in https://github.com/mosaicml/streaming/pull/809
* Bump pytest-split from 0.9.0 to 0.10.0 by dependabot in https://github.com/mosaicml/streaming/pull/810
* Fix logo png by XiaohanZhangCMU in https://github.com/mosaicml/streaming/pull/808
* Update huggingface-hub requirement from <0.26,>=0.23.4 to >=0.23.4,<0.27 by dependabot in https://github.com/mosaicml/streaming/pull/814
* Bump fastapi from 0.115.2 to 0.115.4 by dependabot in https://github.com/mosaicml/streaming/pull/815
* Fix shared memory permission issue in a shared pod environment by XiaohanZhangCMU in https://github.com/mosaicml/streaming/pull/813

New Contributors
* LukaszSztukiewicz made their first contribution in https://github.com/mosaicml/streaming/pull/794
* kursataktas made their first contribution in https://github.com/mosaicml/streaming/pull/805

**Full Changelog**: https://github.com/mosaicml/streaming/compare/v0.9.0...v0.9.1

0.9.0

Whats new
1. Improved compatibility for ndarray and json types (776, 777)
It is now possible to have columns including a map type successfully convert to JSON in an MDS file if the given type for the column is specified as 'json', and allows the JSON encoder to handle ndarray types.

What's Changed
* Bump fastapi from 0.112.1 to 0.112.2 by dependabot in https://github.com/mosaicml/streaming/pull/768
* Bump ci testing by snarayan21 in https://github.com/mosaicml/streaming/pull/770
* Bump jupyter from 1.0.0 to 1.1.1 by dependabot in https://github.com/mosaicml/streaming/pull/772
* Bump fastapi from 0.112.2 to 0.114.0 by dependabot in https://github.com/mosaicml/streaming/pull/779
* Bump pydantic from 2.8.2 to 2.9.1 by dependabot in https://github.com/mosaicml/streaming/pull/778
* Allow JSON encoder to handle ndarray by srowen in https://github.com/mosaicml/streaming/pull/777
* Add MapType as JSON-compatible by srowen in https://github.com/mosaicml/streaming/pull/776
* Bump fastapi from 0.114.0 to 0.114.2 by dependabot in https://github.com/mosaicml/streaming/pull/783
* Update datasets requirement from <3,>=2.4.0 to >=2.4.0,<4 by dependabot in https://github.com/mosaicml/streaming/pull/784
* Bump pytest from 8.3.2 to 8.3.3 by dependabot in https://github.com/mosaicml/streaming/pull/782
* Bump main branch to 0.10.0.dev0 by dakinggg in https://github.com/mosaicml/streaming/pull/790


**Full Changelog**: https://github.com/mosaicml/streaming/compare/v0.8.1...v0.9.0

0.8.1

🔧 Improvements
**Dataloader hanging between epochs has now been resolved!** We've seen training time improvements of up to 40% for some many-epoch training jobs. If this was impacting your runs and has now been fixed, please let us know!
* Fix dataloader hang at the end of an epoch by XiaohanZhangCMU in https://github.com/mosaicml/streaming/pull/741
* Add default compression, and warning about local paths to dataframe_to_mds by srowen in https://github.com/mosaicml/streaming/pull/748
* Throw exception when event.is_set() after write()s by srowen in https://github.com/mosaicml/streaming/pull/754

🐛 Bug Fixes
* Ensure deterministic sample order between epochs when `shuffle=False` by snarayan21 in https://github.com/mosaicml/streaming/pull/750

What's Changed
* Make Pytest log in color in Github Action by eitanturok in https://github.com/mosaicml/streaming/pull/739
* fix azure container name and blob name in download_from_azure by jaehwana2z in https://github.com/mosaicml/streaming/pull/733
* Bump uvicorn from 0.30.3 to 0.30.5 by dependabot in https://github.com/mosaicml/streaming/pull/743
* Update huggingface-hub requirement from <0.24,>=0.23.4 to >=0.23.4,<0.25 by dependabot in https://github.com/mosaicml/streaming/pull/729
* Bump fastapi from 0.111.1 to 0.112.0 by dependabot in https://github.com/mosaicml/streaming/pull/744
* Bump ci-testing to v0.1.0 by snarayan21 in https://github.com/mosaicml/streaming/pull/745
* Patching conf.py due to Sphinx deprecating config manipulation by snarayan21 in https://github.com/mosaicml/streaming/pull/746
* Bump ci-testing to v0.1.2 by snarayan21 in https://github.com/mosaicml/streaming/pull/747
* Type hints conformant with pep 585 by snarayan21 in https://github.com/mosaicml/streaming/pull/752
* Ruff rule to remove unused imports by snarayan21 in https://github.com/mosaicml/streaming/pull/756
* Fix linting for numpy 2.1.0 by snarayan21 in https://github.com/mosaicml/streaming/pull/764
* Bump fastapi from 0.112.0 to 0.112.1 by dependabot in https://github.com/mosaicml/streaming/pull/760
* Bump uvicorn from 0.30.5 to 0.30.6 by dependabot in https://github.com/mosaicml/streaming/pull/762
* Version 0.8.1 bump! by snarayan21 in https://github.com/mosaicml/streaming/pull/766

New Contributors
* eitanturok made their first contribution in https://github.com/mosaicml/streaming/pull/739
* jaehwana2z made their first contribution in https://github.com/mosaicml/streaming/pull/733
* srowen made their first contribution in https://github.com/mosaicml/streaming/pull/748

**Full Changelog**: https://github.com/mosaicml/streaming/compare/v0.8.0...v0.8.1

0.8.0

✨ What's New ✨

**1. HF File System Streaming (711)**

Streaming now supports streaming data from HF file system! This adds another popular backend as an option to host your data.


What's Changed
* Bump fastapi from 0.110.2 to 0.111.0 by dependabot in https://github.com/mosaicml/streaming/pull/670
* Fix: having zero bytes files after converting spark dataframe to MDS saved on dbfs:/Volumes by XiaohanZhangCMU in https://github.com/mosaicml/streaming/pull/668
* Ensure shards cannot be larger than 4GB by snarayan21 in https://github.com/mosaicml/streaming/pull/672
* Helpful error on `py1e` for improperly written datasets by snarayan21 in https://github.com/mosaicml/streaming/pull/673
* Bump pytest from 8.2.0 to 8.2.1 by dependabot in https://github.com/mosaicml/streaming/pull/680
* Update platform references by aspfohl in https://github.com/mosaicml/streaming/pull/675
* Update CODEOWNERS by karan6181 in https://github.com/mosaicml/streaming/pull/681
* Fix `batch_size` typo for `Stream` object in docs by snarayan21 in https://github.com/mosaicml/streaming/pull/682
* Bump databricks-sdk from 0.27.0 to 0.27.1 by dependabot in https://github.com/mosaicml/streaming/pull/679
* Improve local temp directory error when only `remote` is specified by snarayan21 in https://github.com/mosaicml/streaming/pull/683
* Fix node calculation in `replication` for `World` object by snarayan21 in https://github.com/mosaicml/streaming/pull/685
* Warning condition changed for Sequence Parallelism by XiaohanZhangCMU in https://github.com/mosaicml/streaming/pull/688
* Bump pydantic from 2.7.1 to 2.7.2 by dependabot in https://github.com/mosaicml/streaming/pull/692
* Bump uvicorn from 0.29.0 to 0.30.1 by dependabot in https://github.com/mosaicml/streaming/pull/691
* Make sure epoch_size is an int by snarayan21 in https://github.com/mosaicml/streaming/pull/693
* Bump databricks-sdk from 0.27.1 to 0.28.0 by dependabot in https://github.com/mosaicml/streaming/pull/687
* Bump pytest from 8.2.1 to 8.2.2 by dependabot in https://github.com/mosaicml/streaming/pull/697
* fix: expand user path for Writer's output directory. by huxuan in https://github.com/mosaicml/streaming/pull/694
* Bump pydantic from 2.7.2 to 2.7.3 by dependabot in https://github.com/mosaicml/streaming/pull/696
* Fix edge cases with scalar or empty numpy array encoding by snarayan21 in https://github.com/mosaicml/streaming/pull/702
* Raise IndexError in `Spanner` object instead of `ValueError` by snarayan21 in https://github.com/mosaicml/streaming/pull/701
* Fix linting issues with numpy 2 by snarayan21 in https://github.com/mosaicml/streaming/pull/705
* Bump pydantic from 2.7.3 to 2.7.4 by dependabot in https://github.com/mosaicml/streaming/pull/704
* Enable correct resumption from the end of an epoch by snarayan21 in https://github.com/mosaicml/streaming/pull/700
* Fix `drop_first` checking in partitioning to account for `world_size` divisibility by snarayan21 in https://github.com/mosaicml/streaming/pull/706
* fix convert imagenet by Hprairie in https://github.com/mosaicml/streaming/pull/708
* Bump pytest-split from 0.8.2 to 0.9.0 by dependabot in https://github.com/mosaicml/streaming/pull/710
* Remove duplicate `dbfs:` prefix from error message by vanshcsingh in https://github.com/mosaicml/streaming/pull/712
* enable adaptive retry for s3 download by bigning in https://github.com/mosaicml/streaming/pull/713
* Upgrade ci_testing, remove codeql by snarayan21 in https://github.com/mosaicml/streaming/pull/714
* Fix Linting from Pillow version update by XiaohanZhangCMU in https://github.com/mosaicml/streaming/pull/719
* Bump pydantic from 2.7.4 to 2.8.2 by dependabot in https://github.com/mosaicml/streaming/pull/718
* Bump databricks-sdk from 0.28.0 to 0.29.0 by dependabot in https://github.com/mosaicml/streaming/pull/715
* Add HF File System Support to Streaming by orionw in https://github.com/mosaicml/streaming/pull/711
* Improve error message on non-0 rank when index file download failed by bigning in https://github.com/mosaicml/streaming/pull/723
* Bump pytest from 8.2.2 to 8.3.2 by dependabot in https://github.com/mosaicml/streaming/pull/735
* Bump uvicorn from 0.30.1 to 0.30.3 by dependabot in https://github.com/mosaicml/streaming/pull/730
* Bump fastapi from 0.111.0 to 0.111.1 by dependabot in https://github.com/mosaicml/streaming/pull/724
* Bump Streaming Version to 0.8.0 by mvpatel2000 in https://github.com/mosaicml/streaming/pull/738

New Contributors
* aspfohl made their first contribution in https://github.com/mosaicml/streaming/pull/675
* huxuan made their first contribution in https://github.com/mosaicml/streaming/pull/694
* Hprairie made their first contribution in https://github.com/mosaicml/streaming/pull/708
* vanshcsingh made their first contribution in https://github.com/mosaicml/streaming/pull/712
* orionw made their first contribution in https://github.com/mosaicml/streaming/pull/711

**Full Changelog**: https://github.com/mosaicml/streaming/compare/v0.7.6...v0.8.0

Page 1 of 5

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.