Mosaicml

Latest version: v0.29.0

Safety actively analyzes 714919 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 11 of 15

0.8.0

Not secure
✨ What's New ✨

**1. HF File System Streaming (711)**

Streaming now supports streaming data from HF file system! This adds another popular backend as an option to host your data.


What's Changed
* Bump fastapi from 0.110.2 to 0.111.0 by dependabot in https://github.com/mosaicml/streaming/pull/670
* Fix: having zero bytes files after converting spark dataframe to MDS saved on dbfs:/Volumes by XiaohanZhangCMU in https://github.com/mosaicml/streaming/pull/668
* Ensure shards cannot be larger than 4GB by snarayan21 in https://github.com/mosaicml/streaming/pull/672
* Helpful error on `py1e` for improperly written datasets by snarayan21 in https://github.com/mosaicml/streaming/pull/673
* Bump pytest from 8.2.0 to 8.2.1 by dependabot in https://github.com/mosaicml/streaming/pull/680
* Update platform references by aspfohl in https://github.com/mosaicml/streaming/pull/675
* Update CODEOWNERS by karan6181 in https://github.com/mosaicml/streaming/pull/681
* Fix `batch_size` typo for `Stream` object in docs by snarayan21 in https://github.com/mosaicml/streaming/pull/682
* Bump databricks-sdk from 0.27.0 to 0.27.1 by dependabot in https://github.com/mosaicml/streaming/pull/679
* Improve local temp directory error when only `remote` is specified by snarayan21 in https://github.com/mosaicml/streaming/pull/683
* Fix node calculation in `replication` for `World` object by snarayan21 in https://github.com/mosaicml/streaming/pull/685
* Warning condition changed for Sequence Parallelism by XiaohanZhangCMU in https://github.com/mosaicml/streaming/pull/688
* Bump pydantic from 2.7.1 to 2.7.2 by dependabot in https://github.com/mosaicml/streaming/pull/692
* Bump uvicorn from 0.29.0 to 0.30.1 by dependabot in https://github.com/mosaicml/streaming/pull/691
* Make sure epoch_size is an int by snarayan21 in https://github.com/mosaicml/streaming/pull/693
* Bump databricks-sdk from 0.27.1 to 0.28.0 by dependabot in https://github.com/mosaicml/streaming/pull/687
* Bump pytest from 8.2.1 to 8.2.2 by dependabot in https://github.com/mosaicml/streaming/pull/697
* fix: expand user path for Writer's output directory. by huxuan in https://github.com/mosaicml/streaming/pull/694
* Bump pydantic from 2.7.2 to 2.7.3 by dependabot in https://github.com/mosaicml/streaming/pull/696
* Fix edge cases with scalar or empty numpy array encoding by snarayan21 in https://github.com/mosaicml/streaming/pull/702
* Raise IndexError in `Spanner` object instead of `ValueError` by snarayan21 in https://github.com/mosaicml/streaming/pull/701
* Fix linting issues with numpy 2 by snarayan21 in https://github.com/mosaicml/streaming/pull/705
* Bump pydantic from 2.7.3 to 2.7.4 by dependabot in https://github.com/mosaicml/streaming/pull/704
* Enable correct resumption from the end of an epoch by snarayan21 in https://github.com/mosaicml/streaming/pull/700
* Fix `drop_first` checking in partitioning to account for `world_size` divisibility by snarayan21 in https://github.com/mosaicml/streaming/pull/706
* fix convert imagenet by Hprairie in https://github.com/mosaicml/streaming/pull/708
* Bump pytest-split from 0.8.2 to 0.9.0 by dependabot in https://github.com/mosaicml/streaming/pull/710
* Remove duplicate `dbfs:` prefix from error message by vanshcsingh in https://github.com/mosaicml/streaming/pull/712
* enable adaptive retry for s3 download by bigning in https://github.com/mosaicml/streaming/pull/713
* Upgrade ci_testing, remove codeql by snarayan21 in https://github.com/mosaicml/streaming/pull/714
* Fix Linting from Pillow version update by XiaohanZhangCMU in https://github.com/mosaicml/streaming/pull/719
* Bump pydantic from 2.7.4 to 2.8.2 by dependabot in https://github.com/mosaicml/streaming/pull/718
* Bump databricks-sdk from 0.28.0 to 0.29.0 by dependabot in https://github.com/mosaicml/streaming/pull/715
* Add HF File System Support to Streaming by orionw in https://github.com/mosaicml/streaming/pull/711
* Improve error message on non-0 rank when index file download failed by bigning in https://github.com/mosaicml/streaming/pull/723
* Bump pytest from 8.2.2 to 8.3.2 by dependabot in https://github.com/mosaicml/streaming/pull/735
* Bump uvicorn from 0.30.1 to 0.30.3 by dependabot in https://github.com/mosaicml/streaming/pull/730
* Bump fastapi from 0.111.0 to 0.111.1 by dependabot in https://github.com/mosaicml/streaming/pull/724
* Bump Streaming Version to 0.8.0 by mvpatel2000 in https://github.com/mosaicml/streaming/pull/738

New Contributors
* aspfohl made their first contribution in https://github.com/mosaicml/streaming/pull/675
* huxuan made their first contribution in https://github.com/mosaicml/streaming/pull/694
* Hprairie made their first contribution in https://github.com/mosaicml/streaming/pull/708
* vanshcsingh made their first contribution in https://github.com/mosaicml/streaming/pull/712
* orionw made their first contribution in https://github.com/mosaicml/streaming/pull/711

**Full Changelog**: https://github.com/mosaicml/streaming/compare/v0.7.6...v0.8.0

0.7.6

Streaming `v0.7.6` is released! Install via `pip`:


pip install --upgrade mosaicml-streaming==0.7.6


:gem: New Features

1. `device_per_stream` batching method
Users can now construct batches such that each device sees only samples from a single stream. This is very useful in cases where different data sources have samples/tensors of different sizes, but the model should still see samples from these different data sources at each optimizer step.
* Adding `device_per_stream` batching by snarayan21 in https://github.com/mosaicml/streaming/pull/661

2. Add `ndarray` type for Spark dataframes.
Enable parsing Spark's ArrayType (of ShortType, LongType, IntegerType, FloatType, DoubleType) when converting a Spark dataframe to MDS.
* Add ndarray type by XiaohanZhangCMU in https://github.com/mosaicml/streaming/pull/623

3. Support for Alipan storage
Adds support for Alipan, Alibaba's cloud storage service.
* Add support for Alipan Storage backend by PeterDing in https://github.com/mosaicml/streaming/pull/651

What's Changed
* Bump fastapi from 0.110.0 to 0.110.2 by dependabot in https://github.com/mosaicml/streaming/pull/660
* Bump pydantic from 2.6.4 to 2.7.0 by dependabot in https://github.com/mosaicml/streaming/pull/653
* Bump pydantic from 2.7.0 to 2.7.1 by dependabot in https://github.com/mosaicml/streaming/pull/666
* Bump pytest from 8.1.1 to 8.2.0 by dependabot in https://github.com/mosaicml/streaming/pull/664
* Bump databricks-sdk from 0.23.0 to 0.27.0 by dependabot in https://github.com/mosaicml/streaming/pull/667
* Version bump to v0.7.6 by snarayan21 in https://github.com/mosaicml/streaming/pull/669

New Contributors
* PeterDing made their first contribution in https://github.com/mosaicml/streaming/pull/651

**Full Changelog**: https://github.com/mosaicml/streaming/compare/v0.7.5...v0.7.6

0.7.5

Streaming `v0.7.5` is released! Install via `pip`:


pip install --upgrade mosaicml-streaming==0.7.5


:gem: New Features

1. Tensor/Sequence Parallelism Support
Using the `replication` argument, easily share data samples across multiple ranks, enabling sequence or tensor parallelism.
* Replicating samples across devices (SP / TP enablement) by knighton in https://github.com/mosaicml/streaming/pull/597
* Expanded replication testing + documentation by snarayan21 in https://github.com/mosaicml/streaming/pull/607
* Make streaming use the correct number of unique samples with SP/TP by snarayan21 in https://github.com/mosaicml/streaming/pull/619

2. Overhauled Streaming Documentation
New and improved streaming documentation can be found [here](https://docs.mosaicml.com/projects/streaming/en/stable/#) -- please submit issues with any feedback.
* Major overhaul of Streaming documentation by snarayan21 in https://github.com/mosaicml/streaming/pull/636

3. `batch_size` is now required for StreamingDataset
As we have seen multiple errors and performance degradations from users not setting the `batch_size` argument to StreamingDataset, we are making it a requirement to iterate over the dataset.
* You must set batch size. There is no other way. by snarayan21 in https://github.com/mosaicml/streaming/pull/624

3. Support for Python 3.11, deprecate Python 3.8
* Add support for Python 3.11 and deprecate Python 3.8 by karan6181 in https://github.com/mosaicml/streaming/pull/586

🐛 Bug Fixes
* [easy typo fix] fix f-string by bigning in https://github.com/mosaicml/streaming/pull/596
* Change comparison in partitions to include equals by JAEarly in https://github.com/mosaicml/streaming/pull/587
* Use type int when initializing SharedMemory size by bchiang2 in https://github.com/mosaicml/streaming/pull/604
* COCO Dataset fix -- avoids `allow_unsafe_types=True` by snarayan21 in https://github.com/mosaicml/streaming/pull/647

🔧 Improvements
* Allow writers to overwrite existing data by JAEarly in https://github.com/mosaicml/streaming/pull/594
* Update careers link by milocress in https://github.com/mosaicml/streaming/pull/611
* Update license by b-chu in https://github.com/mosaicml/streaming/pull/568
* Updated documentation for S3-compatible object stores by AIproj in https://github.com/mosaicml/streaming/pull/592
* Make yamllint consistent with Composer by b-chu in https://github.com/mosaicml/streaming/pull/583
* Switch linting workflows to ci-testing repo by b-chu in https://github.com/mosaicml/streaming/pull/616

What's Changed
* Bump uvicorn from 0.26.0 to 0.27.1 by dependabot in https://github.com/mosaicml/streaming/pull/599
* Bump pytest-split from 0.8.1 to 0.8.2 by dependabot in https://github.com/mosaicml/streaming/pull/581
* Update ruff to 0.2.2 by Skylion007 in https://github.com/mosaicml/streaming/pull/608
* Bump fastapi from 0.109.0 to 0.110.0 by dependabot in https://github.com/mosaicml/streaming/pull/610
* Bump yamllint from 1.33.0 to 1.35.1 by dependabot in https://github.com/mosaicml/streaming/pull/601
* Bump uvicorn from 0.27.1 to 0.28.0 by dependabot in https://github.com/mosaicml/streaming/pull/626
* Update moto requirement from <5,>=4.0 to >=4.0,<6 by dependabot in https://github.com/mosaicml/streaming/pull/580
* Bump furo from 2023.7.26 to 2024.1.29 by dependabot in https://github.com/mosaicml/streaming/pull/631
* Bump pypandoc from 1.12 to 1.13 by dependabot in https://github.com/mosaicml/streaming/pull/630
* Bump databricks-sdk from 0.14.0 to 0.22.0 by dependabot in https://github.com/mosaicml/streaming/pull/629
* Add batch_size to 1 if not provided for regression testing by karan6181 in https://github.com/mosaicml/streaming/pull/635
* Fixed docstring note for getting sequential sample ordering by snarayan21 in https://github.com/mosaicml/streaming/pull/632
* Bump pytest and fix failing test by snarayan21 in https://github.com/mosaicml/streaming/pull/642
* Update pytest-cov requirement from <5,>=4 to >=4,<6 by dependabot in https://github.com/mosaicml/streaming/pull/638
* Bump pydantic from 2.5.3 to 2.6.4 by dependabot in https://github.com/mosaicml/streaming/pull/639
* Bump uvicorn from 0.28.0 to 0.29.0 by dependabot in https://github.com/mosaicml/streaming/pull/640
* Bump databricks-sdk from 0.22.0 to 0.23.0 by dependabot in https://github.com/mosaicml/streaming/pull/644
* Version bump to 0.7.5 by snarayan21 in https://github.com/mosaicml/streaming/pull/650

New Contributors
* bigning made their first contribution in https://github.com/mosaicml/streaming/pull/596
* JAEarly made their first contribution in https://github.com/mosaicml/streaming/pull/587
* AIproj made their first contribution in https://github.com/mosaicml/streaming/pull/592
* milocress made their first contribution in https://github.com/mosaicml/streaming/pull/611
* bchiang2 made their first contribution in https://github.com/mosaicml/streaming/pull/604

**Full Changelog**: https://github.com/mosaicml/streaming/compare/v0.7.4...v0.7.5

0.7.4

Streaming `v0.7.4` is released! Install via `pip`:


pip install --upgrade mosaicml-streaming==0.7.4


🐛 Bug Fixes
* Download to temporary path from azure by philipnrmn in https://github.com/mosaicml/streaming/pull/566
* fix(merge_index): scheme was not well formatted by fwertel in https://github.com/mosaicml/streaming/pull/576
* Update misplaced params of _format_remote_index_files by lsongx in https://github.com/mosaicml/streaming/pull/584
* Modifications to resumption shared memory allowing `load_state_dict` multiple times. by snarayan21 in https://github.com/mosaicml/streaming/pull/593

What's Changed
* Bump fastapi from 0.108.0 to 0.109.0 by dependabot in https://github.com/mosaicml/streaming/pull/564
* Bump gitpython from 3.1.40 to 3.1.41 by dependabot in https://github.com/mosaicml/streaming/pull/565
* Download to temporary path from azure by philipnrmn in https://github.com/mosaicml/streaming/pull/566
* Use `tempfile.gettempdir()` instead of a hardcoded temp root. by knighton in https://github.com/mosaicml/streaming/pull/570
* fix(merge_index): scheme was not well formatted by fwertel in https://github.com/mosaicml/streaming/pull/576
* Bump uvicorn from 0.25.0 to 0.26.0 by dependabot in https://github.com/mosaicml/streaming/pull/572
* Bump sphinx-tabs from 3.4.4 to 3.4.5 by dependabot in https://github.com/mosaicml/streaming/pull/571
* Update misplaced params of _format_remote_index_files by lsongx in https://github.com/mosaicml/streaming/pull/584
* Remove .ci folder and move FILE_HEADER and CODEOWNERS by irenedea in https://github.com/mosaicml/streaming/pull/588
* Modifications to resumption shared memory allowing `load_state_dict` multiple times. by snarayan21 in https://github.com/mosaicml/streaming/pull/593
* Bump version to 0.7.4 by snarayan21 in https://github.com/mosaicml/streaming/pull/595

New Contributors
* philipnrmn made their first contribution in https://github.com/mosaicml/streaming/pull/566
* fwertel made their first contribution in https://github.com/mosaicml/streaming/pull/576
* lsongx made their first contribution in https://github.com/mosaicml/streaming/pull/584
* irenedea made their first contribution in https://github.com/mosaicml/streaming/pull/588

**Full Changelog**: https://github.com/mosaicml/streaming/compare/v0.7.3...v0.7.4

0.7.3

Streaming `v0.7.3` is released! Install via `pip`:


pip install --upgrade mosaicml-streaming==0.7.3


🐛 Bug Fixes
- Logging messages for new defaults only show once per rank. (543)
- Fixed padding calculation for repeat samples in the partition. (544)

🔧 Other improvements
- Update copyright license year from 2023 -> 2022-2024. (560)

What's Changed
* Logging messages from new defaults only show once per rank. by snarayan21 in https://github.com/mosaicml/streaming/pull/543
* Fixed condition for warning when partitioning over tiny datasets. by snarayan21 in https://github.com/mosaicml/streaming/pull/544
* Removing stray print statement by snarayan21 in https://github.com/mosaicml/streaming/pull/553
* Bump pydantic from 2.5.2 to 2.5.3 by dependabot in https://github.com/mosaicml/streaming/pull/548
* Bump uvicorn from 0.24.0.post1 to 0.25.0 by dependabot in https://github.com/mosaicml/streaming/pull/549
* Bump fastapi from 0.104.1 to 0.108.0 by dependabot in https://github.com/mosaicml/streaming/pull/557
* Bump pytest from 7.4.3 to 7.4.4 by dependabot in https://github.com/mosaicml/streaming/pull/558
* Update copyright: 2023 -> 2022-2024. by knighton in https://github.com/mosaicml/streaming/pull/560
* Bump version to 0.7.3 by karan6181 in https://github.com/mosaicml/streaming/pull/562


**Full Changelog**: https://github.com/mosaicml/streaming/compare/v0.7.2...v0.7.3

0.7.2

Streaming `v0.7.2` is released! Install via `pip`:


pip install --upgrade mosaicml-streaming==0.7.2



:gem: New Features
1. Canned ACL Support (512)
Add support for the Canned ACL using the environment variable `S3_CANNED_ACL` for AWS S3. Checkout [Canned ACL](https://docs.mosaicml.com/projects/streaming/en/stable/how_to_guides/configure_cloud_storage_credentials.html#canned-acl) document on how to use it.

2. Allow/reject datasets containing unsafe types (519)
The pickle serialization format, one of the available MDS encodings, is a potential security vulnerability. We added a boolean flag `allow_unsafe_types ` in the `StreamingDataset` class to allow or reject datasets containing Pickle.



🐛 Bug Fixes
- Retrieve batch size correctly from vision yamls for the streaming simulator (501)
- Fix for CVE-2023-47248 (504)
- Streaming simulator bug fixes (proportion, repeat, yaml ingestion) (514)
- Proportion of None instead of a string 'None' is now handled correctly.
- Repeat of None instead of a string 'None' is now handled correctly.
- Added warning for StreamingDataset subclass defaults
- Fix sample partitioning algorithm bug for tiny datasets (517)

🔧 Improvements
- Added warning messages for new streaming dataset defaults to inform users about the old and new values. (502)

What's Changed
* Migrate pydocstyle to ruff by Skylion007 in https://github.com/mosaicml/streaming/pull/500
* Bump fastapi from 0.104.0 to 0.104.1 by dependabot in https://github.com/mosaicml/streaming/pull/496
* Bump uvicorn from 0.23.2 to 0.24.0.post1 by dependabot in https://github.com/mosaicml/streaming/pull/497
* Retrieve batch size correctly from vision yamls for simulator by snarayan21 in https://github.com/mosaicml/streaming/pull/501
* Adding warning messages for new defaults by snarayan21 in https://github.com/mosaicml/streaming/pull/502
* Fix for CVE-2023-47248 by bandish-shah in https://github.com/mosaicml/streaming/pull/504
* Bump pydantic from 2.4.2 to 2.5.2 by dependabot in https://github.com/mosaicml/streaming/pull/513
* Bump yamllint from 1.32.0 to 1.33.0 by dependabot in https://github.com/mosaicml/streaming/pull/506
* Fixed comments and update dataframe_to_MDS API signature by karan6181 in https://github.com/mosaicml/streaming/pull/515
* Simulator bug fixes (proportion, repeat, yaml ingestion) by snarayan21 in https://github.com/mosaicml/streaming/pull/514
* Add support for the Canned ACL environment variable for AWS S3 by karan6181 in https://github.com/mosaicml/streaming/pull/512
* Fixed bugs when trying to use very small datasets by snarayan21 in https://github.com/mosaicml/streaming/pull/517
* Bump databricks-sdk from 0.8.0 to 0.14.0 by dependabot in https://github.com/mosaicml/streaming/pull/518
* Add flag to allow or reject datasets containing unsafe types (i.e., Pickle) by knighton in https://github.com/mosaicml/streaming/pull/519
* improve exception error messages for downloading by Skylion007 in https://github.com/mosaicml/streaming/pull/525
* doc: add NDArray format by OrenLeung in https://github.com/mosaicml/streaming/pull/527
* Offload exception to mds_write. by XiaohanZhangCMU in https://github.com/mosaicml/streaming/pull/528
* Add allow_unsafe_types parameter to the streaming regression tests by karan6181 in https://github.com/mosaicml/streaming/pull/531
* Bump version to 0.7.2 by karan6181 in https://github.com/mosaicml/streaming/pull/532

New Contributors
* OrenLeung made their first contribution in https://github.com/mosaicml/streaming/pull/527

**Full Changelog**: https://github.com/mosaicml/streaming/compare/v0.7.1...v0.7.2

Page 11 of 15

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.