Mosaicml

Latest version: v0.27.0

Safety actively analyzes 682532 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 14 of 15

0.2.2

Streaming v0.2.2 is released! Install via pip:


pip install --upgrade mosaicml-streaming==0.2.2


New Features

* Add in-browser partitioning visualizer (https://github.com/mosaicml/streaming/pull/108)
* Add command-line partitioning visualizer (https://github.com/mosaicml/streaming/pull/115)

Bug Fixes

* Get dataloader worker multiprocessing working with spawn, removing Mac OSX fork requirement (https://github.com/mosaicml/streaming/pull/97)
* Improve error messaging (https://github.com/mosaicml/streaming/pull/100)
* Fix CUDA OOM (https://github.com/mosaicml/streaming/pull/103)
* Fix broken source code links in docs (https://github.com/mosaicml/streaming/pull/104)
* Reference the shared memory object in a worker process when using spawn multiprocessing method (https://github.com/mosaicml/streaming/pull/106)
* Release all the StreamingDataset resources during job termination (https://github.com/mosaicml/streaming/pull/107)

What's Changed
* Lazily instantiate the worker barrier in __iter__ (so it all pickles). by knighton in https://github.com/mosaicml/streaming/pull/97
* linkcode -> viewcode by dakinggg in https://github.com/mosaicml/streaming/pull/104
* Update writer.py by sophiawisdom in https://github.com/mosaicml/streaming/pull/100
* Bump sphinxext-opengraph from 0.7.3 to 0.7.4 by dependabot in https://github.com/mosaicml/streaming/pull/105
* Removed cuda memory allocation which was causing CUDA OOM by karan6181 in https://github.com/mosaicml/streaming/pull/103
* Reference the shared memory object in a worker process when using spawn multiprocessing method by karan6181 in https://github.com/mosaicml/streaming/pull/106
* Release all the StreamingDataset resources during job termination by karan6181 in https://github.com/mosaicml/streaming/pull/107
* Bump gitpython from 3.1.29 to 3.1.30 by dependabot in https://github.com/mosaicml/streaming/pull/109
* Bump nbsphinx from 0.8.10 to 0.8.11 by dependabot in https://github.com/mosaicml/streaming/pull/111
* Visualize partitioning by knighton in https://github.com/mosaicml/streaming/pull/108
* Command-line partitioning visualizer. by knighton in https://github.com/mosaicml/streaming/pull/115
* Fix (sys.meta_path is None, Python is likely shutting down) by knighton in https://github.com/mosaicml/streaming/pull/116
* Bump version. by knighton in https://github.com/mosaicml/streaming/pull/117

New Contributors
* dakinggg made their first contribution in https://github.com/mosaicml/streaming/pull/104
* sophiawisdom made their first contribution in https://github.com/mosaicml/streaming/pull/100

**Full Changelog**: https://github.com/mosaicml/streaming/compare/v0.2.1...v0.2.2

0.2.1

Streaming `v0.2.1` is released! Install via `pip`:


pip install --upgrade mosaicml-streaming==0.2.1


Bug Fixes
- Make StreamingDataset smarter about when to init dist itself, fixing env var rendezvous problem (https://github.com/mosaicml/streaming/pull/94).
- Shorten shared memory names for Mac OSX (https://github.com/mosaicml/streaming/pull/95).
- Reduce memory usage in StreamingDataset, alleviating inscrutable worker OOMs with large datasets (https://github.com/mosaicml/streaming/pull/96).
- Better exception handling in downloading (https://github.com/mosaicml/streaming/pull/98).
- Hard require fork for dataloader multiprocessing in Mac OSX due to unpickleable objects (https://github.com/mosaicml/streaming/pull/101).


What's Changed
* Also check if dist env vars are set. If not set, don't init dist. by knighton in https://github.com/mosaicml/streaming/pull/94
* Shorten the names of shared memory objects to make OSX happy. by knighton in https://github.com/mosaicml/streaming/pull/95
* Just do the partitioning/shuffling in the local leader worker. by knighton in https://github.com/mosaicml/streaming/pull/96
* propagate the actual exception and raise by karan6181 in https://github.com/mosaicml/streaming/pull/98
* Set multiprocessing method as fork for Mac OS by karan6181 in https://github.com/mosaicml/streaming/pull/101
* Bump version to 0.2.1 by karan6181 in https://github.com/mosaicml/streaming/pull/102


**Full Changelog**: https://github.com/mosaicml/streaming/compare/v0.2.0...v0.2.1

0.2.0

Streaming `v0.2.0` is released! Install via `pip`:


pip install --upgrade mosaicml-streaming==0.2.0


New Features

1. **Elastic world size deterministic shuffle**

Shuffled or not, StreamingDataset now collectively traverses the samples in identical order across all the devices, given a seed and a canonical number of nodes. **This ordering holds true even if you checkpoint and resume training of the same epoch on a different number of nodes.**

2. **Instant Mid-Epoch Resumption**

Waiting while your data loader spins to resume from where you left off can be costly! StreamingDataset now lets you resume immediately.

3. **NEW StreamingDataLoader**
A `StreamingDataLoader` is a drop-in replacement for your PyTorch `DataLoader` with a Mid-Epoch Resumption functionality where it resumes from where you left off without spinning the dataloader.

4. **Support for Oracle Cloud Infrastructure (OCI) blob storage**

Streaming now supports OCI blob storage as a storage backend for streaming. One can pass the OCI blob storage as either `oci://<bucket_name><namespace>/<folder_name>/<filename>` or `oci://<bucket_name>/<folder_name>/<filename>` to a `StreamingDataset` class. For example:

bash
from streaming import StreamingDataset

remote = 'oci://<bucket><namespace>/<path>'
local = '/tmp/dataset/'

train_dataset = StreamingDataset(local=local, remote=remote, split='train')


Streaming expects the credentials to be present in `~/.oci/config` path.
5. **Support for public AWS S3 buckets**

Streaming now supports AWS S3 buckets which are public resources that can be accessed without credentials, apart from the already supported private AWS S3 buckets. One can instantiate the `StreamingDataset` class with an AWS S3 bucket as follows


from streaming import StreamingDataset

remote = 's3://<bucket>/<path>'
local = '/tmp/dataset/'

train_dataset = StreamingDataset(local=local, remote=remote, split='train')


API changes
- The class `Dataset` has been renamed as class `StreamingDataset` (https://github.com/mosaicml/streaming/pull/37).
- Similarly, built-in most popular datasets class has also been renamed. For example,
- `C4` renamed as `StreamingC4`
- `EnWiki` renamed as `StreamingEnWiki`
- `Pile` renamed as `StreamingEnWiki`
- `ADE20K` renamed as `StreamingADE20K`
- `CIFAR10` renamed as `StreamingCIFAR10`
- `COCO` renamed as `StreamingCOCO`
- `ImageNet` renamed as `StreamingImageNet`
- The parameter `prefetch` in class `Dataset` has been renamed as `predownload` in class `StreamingDataset` (https://github.com/mosaicml/streaming/pull/37).
- The parameter `retry` in class `Dataset` has been renamed as `download_retry` in class `StreamingDataset` (https://github.com/mosaicml/streaming/pull/37).
- The parameter `timeout` in class `Dataset` has been renamed as `download_timeout` in class `StreamingDataset` (https://github.com/mosaicml/streaming/pull/37).
- The parameter `hash` in class `Dataset` has been renamed as `validate_hash` in class `StreamingDataset` (https://github.com/mosaicml/streaming/pull/37).

What's Changed
* Bump nbsphinx from 0.8.9 to 0.8.10 by dependabot in https://github.com/mosaicml/streaming/pull/73
* Bump sphinx-argparse from 0.3.2 to 0.4.0 by dependabot in https://github.com/mosaicml/streaming/pull/74
* The Pile (conversion + streaming dataset) by knighton in https://github.com/mosaicml/streaming/pull/71
* [Docs] Switch back to RTD search by bandish-shah in https://github.com/mosaicml/streaming/pull/83
* make pyright precommit check actually run by dblalock in https://github.com/mosaicml/streaming/pull/84
* Fixed stale URL references by bandish-shah in https://github.com/mosaicml/streaming/pull/85
* Bump sphinx-copybutton from 0.5.0 to 0.5.1 by dependabot in https://github.com/mosaicml/streaming/pull/78
* Bump pandoc from 2.2 to 2.3 by dependabot in https://github.com/mosaicml/streaming/pull/79
* Bump sphinxcontrib-katex from 0.9.0 to 0.9.3 by dependabot in https://github.com/mosaicml/streaming/pull/80
* Bump sphinxext-opengraph from 0.7.2 to 0.7.3 by dependabot in https://github.com/mosaicml/streaming/pull/81
* Support for concat option in C4 Dataset by karan6181 in https://github.com/mosaicml/streaming/pull/77
* Elastic world size deterministic shuffle with mid-epoch resumption by knighton in https://github.com/mosaicml/streaming/pull/37
* Support for S3 public bucket by karan6181 in https://github.com/mosaicml/streaming/pull/88
* Add OCI Cloud Storage support by karan6181 in https://github.com/mosaicml/streaming/pull/86
* Make StreamingDataset state_dict() more flexible by knighton in https://github.com/mosaicml/streaming/pull/90
* Bump version to 0.2.0 by karan6181 in https://github.com/mosaicml/streaming/pull/92


**Full Changelog**: https://github.com/mosaicml/streaming/compare/v0.1.2...v0.2.0

0.1.2

What's Changed
* NoOp Model by Landanjs in https://github.com/mosaicml/diffusion/pull/139
* Script to pre-compute CLIP and T5 by Landanjs in https://github.com/mosaicml/diffusion/pull/144
* Add option to shift noise schedules when changing resolution by coryMosaicML in https://github.com/mosaicml/diffusion/pull/153
* Expose option to set per-stream weighting in image and image_caption datasets by coryMosaicML in https://github.com/mosaicml/diffusion/pull/156
* HF image generation that integrates with Cory's earlier script by rishab-partha in https://github.com/mosaicml/diffusion/pull/158
* MMDiT implementation and text-to-image training with rectified flows by coryMosaicML in https://github.com/mosaicml/diffusion/pull/155
* Add option to use predefined aspect ratio buckets in the cropping transform by coryMosaicML in https://github.com/mosaicml/diffusion/pull/157
* Add latent logger for T5-XXL text encoder by rishab-partha in https://github.com/mosaicml/diffusion/pull/154
* Pass loggers to Trainer in eval by jazcollins in https://github.com/mosaicml/diffusion/pull/166
* Simple LoRA Finetuning (WIP) by rishab-partha in https://github.com/mosaicml/diffusion/pull/164
* Add option to change start and end SNR in SD2/SDXL configs by coryMosaicML in https://github.com/mosaicml/diffusion/pull/165
* Small bug fixes to bulk image generation by coryMosaicML in https://github.com/mosaicml/diffusion/pull/167
* Add dataset for running with precomputed latents from multiple captions by coryMosaicML in https://github.com/mosaicml/diffusion/pull/161
* Small bug fixes for running models without tokenizers by coryMosaicML in https://github.com/mosaicml/diffusion/pull/168

New Contributors
* rishab-partha made their first contribution in https://github.com/mosaicml/diffusion/pull/158

**Full Changelog**: https://github.com/mosaicml/diffusion/compare/v0.1.1...v0.1.2

0.1.1

Minor bug fix related to `max_pad_tokens` at generate time. Other Noise Schedule related features and options.

What's Changed
* Optional quasirandom timesteps, zero terminal SNR, cosine schedule for SD models by coryMosaicML in https://github.com/mosaicml/diffusion/pull/138
* Add HF hub dependency by coryMosaicML in https://github.com/mosaicml/diffusion/pull/142
* Add link to CommonCanvas model weights by Skylion007 in https://github.com/mosaicml/diffusion/pull/143
* Fix autoencoder load by RR4787 in https://github.com/mosaicml/diffusion/pull/141
* Add option to use karras sigmas for SDXL style models by coryMosaicML in https://github.com/mosaicml/diffusion/pull/146
* Fix bug in stable diffusion when mask_pad_tokens is false by coryMosaicML in https://github.com/mosaicml/diffusion/pull/147
* Only use a text encoder mask in SD model forward if mask_pad_tokens is false by coryMosaicML in https://github.com/mosaicml/diffusion/pull/149


**Full Changelog**: https://github.com/mosaicml/diffusion/compare/v0.1.0...v0.1.1

0.1

We've spun off Streaming datasets into it's own [repository](https://github.com/mosaicml/streaming)! Streaming datasets is a high-performance drop-in for Torch `IterableDataset`, enabling users to stream training data from cloud based object stores. Streaming is shipping with built-in support for popular open source datasets (ADE20K, C4, COCO, Enwiki, ImageNet, etc.)

To get started, install the Streaming PyPi package:
bash
pip install mosaicml-streaming


You can use the streaming Dataset class with the PyTorch native DataLoader class as follows:
python
import torch
from streaming import Dataset

dataloader = torch.utils.data.DataLoader(dataset=Dataset(remote='s3://...'))


For more information, please check out the [Streaming docs](https://docs.mosaicml.com/projects/streaming/en/latest/).

1. **✔👉 Simplified Checkpointing Interface**

With this release we’ve greatly simplified configuration of loading and saving checkpoints in Composer.

To save checkpoints to S3, all you need to do is:
- Specify with `save_folder` your full URI to your save directory destination (e.g. `'s3://my-bucket/{run_name}/checkpoints'`)
- Optionally, set `save_filename` to the pattern you want for your checkpoint file names

python
from composer.trainer import Trainer

Checkpoint saving to S3.
trainer = Trainer(
model=model,
save_folder="s3://my-bucket/{run_name}/checkpoints",
run_name='my-run',
save_interval="1ep",
save_filename="ep{epoch}.pt",
save_num_checkpoints_to_keep=0, delete all checkpoints locally
...
)

trainer.fit()


Likewise, to load checkpoints from S3, all you have to do is:
- Set `load_path` to the full URI to your desired checkpoint file (e.g.`'s3://my-bucket/my-run/checkpoints/epoch13.pt'`)

python
from composer.trainer import Trainer

Checkpoint loading from S3.
new_trainer = Trainer(
model=model,
train_dataloader=train_dataloader,
max_duration="10ep",
load_path="s3://my-bucket/my-run/checkpoints/ep13.pt",
)

new_trainer.fit()


For more information, please see our [Checkpointing guide](https://docs.mosaicml.com/en/v0.11.0/trainer/checkpointing.html).

1. **𐄳 Improved Distributed Experience**

We’ve made it easier to write your own custom distributed entry points by exposing our distributed API. You can now leverage all of our helpful distributed functions and contexts.

For example, let's say we want to need to download a dataset in a distributed training application. To avoid race conditions where different ranks try to write the dataset to the same place, we need to ensure that only rank 0 downloads the dataset first:

python
import datetime
from composer.trainer.devices import DeviceGPU
from composer.utils import dist

dist.initialize(DeviceGPU(), datetime.timedelta(seconds=30)) Initialize distributed module

if dist.get_local_rank() == 0: Download dataset on rank zero
dataset = download_my_dataset()
dist.barrier() All ranks wait until dataset is downloaded

Create and train your model!


For more information, please check out our [Distributed API docs](https://docs.mosaicml.com/en/v0.11.0/api_reference/composer.utils.dist.html).

Bug Fixes
* fix loss and eval_forward for HF models (1597)
* add more robust casting to int for fsdp min_params (1608)
* Deepspeed Docs Typo (1605)
* Fix mmdet typo (1618)
* Blurpool idempotent (1625)
* When model is not on `meta` device, initialization should occur on compute device not CPU (1623)
* Auto resumption (1615)
* Adjust speed monitor (1645)
* Hot fix console logging (1643)
* Lazy Logging + pretty print dict for hparams (1653)
* Fix many failing notebook tests (1646)

What's Changed
* Bump coverage[toml] from 6.4.4 to 6.5.0 by dependabot in https://github.com/mosaicml/composer/pull/1583
* Bump furo from 2022.9.15 to 2022.9.29 by dependabot in https://github.com/mosaicml/composer/pull/1584
* Add English Wikipedia 2020-01-01 dataset by knighton in https://github.com/mosaicml/composer/pull/1572
* Add pull request template by dakinggg in https://github.com/mosaicml/composer/pull/1588
* Bump ipykernel from 6.15.3 to 6.16.0 by dependabot in https://github.com/mosaicml/composer/pull/1587
* Update importlib-metadata requirement from <5,>=4.11.0 to >=5.0,<6 by dependabot in https://github.com/mosaicml/composer/pull/1585
* Bump sphinx-argparse from 0.3.1 to 0.3.2 by dependabot in https://github.com/mosaicml/composer/pull/1586
* Add step explicitly to ImageVisualizer logging calls by dakinggg in https://github.com/mosaicml/composer/pull/1591
* Image viz test by dakinggg in https://github.com/mosaicml/composer/pull/1592
* Remove unused fixture by mvpatel2000 in https://github.com/mosaicml/composer/pull/1594
* Fixes RandAugment API by mvpatel2000 in https://github.com/mosaicml/composer/pull/1596
* fix loss and eval_forward for HF models by dskhudia in https://github.com/mosaicml/composer/pull/1597
* Remove tensorflow-io from setup.py by eracah in https://github.com/mosaicml/composer/pull/1577
* Fixes enwiki for the newly processed wiki dataset by dskhudia in https://github.com/mosaicml/composer/pull/1600
* Change install to all by mvpatel2000 in https://github.com/mosaicml/composer/pull/1599
* Remove log level and should_log_artifact by dakinggg in https://github.com/mosaicml/composer/pull/1603
* Add more robust casting to int for fsdp min_params by dblalock in https://github.com/mosaicml/composer/pull/1608
* Deepspeed Docs Typo by mvpatel2000 in https://github.com/mosaicml/composer/pull/1605
* Object store logger refactor by dakinggg in https://github.com/mosaicml/composer/pull/1601
* Bump gitpython from 3.1.27 to 3.1.28 by dependabot in https://github.com/mosaicml/composer/pull/1609
* Bump tabulate from 0.8.10 to 0.9.0 by dependabot in https://github.com/mosaicml/composer/pull/1610
* Log the number of GPUs and nodes Composer running on. by eracah in https://github.com/mosaicml/composer/pull/1604
* Update MLPerfCallback for v2.1 by hanlint in https://github.com/mosaicml/composer/pull/1607
* Remove object store cls by dakinggg in https://github.com/mosaicml/composer/pull/1606
* Add LAMB Optimizer by hanlint in https://github.com/mosaicml/composer/pull/1613
* Mmdet adapter by A-Jacobson in https://github.com/mosaicml/composer/pull/1545
* Fix mmdet typo by Landanjs in https://github.com/mosaicml/composer/pull/1618
* update torchmetrics requirement by hanlint in https://github.com/mosaicml/composer/pull/1620
* Add distributed sampler error by mvpatel2000 in https://github.com/mosaicml/composer/pull/1598
* Landan/deeplabv3 ade20k example by Landanjs in https://github.com/mosaicml/composer/pull/1593
* Upgrade CodeQL Action to version 2 by karan6181 in https://github.com/mosaicml/composer/pull/1628
* Blurpool idempotent by mvpatel2000 in https://github.com/mosaicml/composer/pull/1625
* Defaulting streaming dataset version to 2 by karan6181 in https://github.com/mosaicml/composer/pull/1616
* Abhi/fsdp bugfix 0 11 by abhi-mosaic in https://github.com/mosaicml/composer/pull/1623
* Remove warning when `master_port` is auto selected by abhi-mosaic in https://github.com/mosaicml/composer/pull/1629
* Remove unused import by dakinggg in https://github.com/mosaicml/composer/pull/1630
* Usability improvements to `intitialize_dist()` by growlix in https://github.com/mosaicml/composer/pull/1619
* Remove Graph in Auto Grad Accum by mvpatel2000 in https://github.com/mosaicml/composer/pull/1631
* Auto resumption by dakinggg in https://github.com/mosaicml/composer/pull/1615
* add stop method by hanlint in https://github.com/mosaicml/composer/pull/1627
* S3 Checkpoint Saving By URI by eracah in https://github.com/mosaicml/composer/pull/1614
* S3 Checkpoint loading from URI by eracah in https://github.com/mosaicml/composer/pull/1624
* Add mvpatel2000 as codeowner for algos by mvpatel2000 in https://github.com/mosaicml/composer/pull/1640
* Adjust speed monitor by mvpatel2000 in https://github.com/mosaicml/composer/pull/1645
* Adding in FSDP Docs by bcui19 in https://github.com/mosaicml/composer/pull/1621
* Attempt to fix flaky doctest by dakinggg in https://github.com/mosaicml/composer/pull/1647
* Fix Missing Underscores in FSDP Docs by bcui19 in https://github.com/mosaicml/composer/pull/1648
* Fixed html path for make host command for docs by karan6181 in https://github.com/mosaicml/composer/pull/1642
* Fix hyperparameters logged to console even when progress_bar and log_to_console are False by eracah in https://github.com/mosaicml/composer/pull/1643
* Fix ImageNet Example normalization values by Landanjs in https://github.com/mosaicml/composer/pull/1641
* Python log level by dakinggg in https://github.com/mosaicml/composer/pull/1651
* Changed default logging to WARN for doctests by eracah in https://github.com/mosaicml/composer/pull/1644
* Add Event.AFTER_LOAD by mvpatel2000 in https://github.com/mosaicml/composer/pull/1652
* Lazy Logging + pretty print dict for hparams by eracah in https://github.com/mosaicml/composer/pull/1653
* Fix todo in memory monitor by mvpatel2000 in https://github.com/mosaicml/composer/pull/1654
* Tests for Idempotent Surgery by mvpatel2000 in https://github.com/mosaicml/composer/pull/1639
* Remove c4 dataset by mvpatel2000 in https://github.com/mosaicml/composer/pull/1635
* Update torchmetrics by hanlint in https://github.com/mosaicml/composer/pull/1656
* Search index filtered by project by nqn in https://github.com/mosaicml/composer/pull/1549
* FSDP Tests by bcui19 in https://github.com/mosaicml/composer/pull/1650
* Add composer version to issue template by dakinggg in https://github.com/mosaicml/composer/pull/1657
* Fix many failing notebook tests by dakinggg in https://github.com/mosaicml/composer/pull/1646
* Re-build the Docker images to resolve pip version error by bandish-shah in https://github.com/mosaicml/composer/pull/1655


**Full Changelog**: https://github.com/mosaicml/composer/compare/v0.10.1...v0.11.0

Page 14 of 15

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.