Mosaicml

Latest version: v0.29.0

Safety actively analyzes 723177 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 15

0.23.3

New Features

1. Update mlflow logger to use the new API with time-dimension to view images in MLFlow (3286)

We've enhanced the MLflow logger's `log_image` function to use the new API with time-dimension support, enabling images to be viewed in MLflow.

2. Add logging buffer time to MLFLow logger (3401)

We've added the `logging_buffer_seconds` argument to the MLflow logger, which specifies how many seconds to buffer before sending logs to the MLflow tracking server.

Bug Fixes
1. Only require `databricks-sdk` when on Databricks platform (3389)

Previously, MLFlow always imported the databricks-sdk. Now, we only require the sdk if on the databricks platform and using databricks secrets to access managed MLFlow.

2. Skip extra dataset state load during job resumption (3393)

Previously, when loading a checkpoint with `train_dataloader`, the `dataset_state` would load first, and if `train_dataloader` was set again afterward, `load_state_dict` would be called with a `None` value. Now, we've added a check in the `train_dataloader` setter to skip this redundant load.

3. Fix auto-microbatching on CUDA 12.4 (3400)

In CUDA 12.4, the out-of-memory error message has changed to `CUDA error: out of memory`. Previously, our logic hardcoded checks for `CUDA out of memory` when using `device_train_microbatch_size="auto"`. Now, we check for both `CUDA out of memory` and `CUDA error: out of memory`.

4. Fix mlflow logging to Databricks workspace file paths which startswith `/Shared/` prefix (3410)

Previously, for MLflow logging, we prepended the path `/Users/` to all user-provided logging paths on the Databricks platform, if not specified, including paths starting with `/Shared/`, which was incorrect since `/Shared/` indicates a shared workspace. Now, the `/Users/` prepend is skipped for paths starting with `/Shared/`.

What's Changed
* Bump CI from 0.0.7 to 0.0.8 by KuuCi in https://github.com/mosaicml/composer/pull/3383
* Fix backward compatibility caused by missing eval metrics class by bigning in https://github.com/mosaicml/composer/pull/3385
* Bump version v0.23.2 by bigning in https://github.com/mosaicml/composer/pull/3386
* Restore dev version by bigning in https://github.com/mosaicml/composer/pull/3388
* Only requires `databricks-sdk` when inside the Databricks platform by antoinebrl in https://github.com/mosaicml/composer/pull/3389
* Update packaging requirement from <24.1,>=21.3.0 to >=21.3.0,<24.2 by dependabot in https://github.com/mosaicml/composer/pull/3392
* Bump cryptography from 42.0.6 to 42.0.8 by dependabot in https://github.com/mosaicml/composer/pull/3391
* Skip extra dataset state load by mvpatel2000 in https://github.com/mosaicml/composer/pull/3393
* Remove FSDP restriction from PyTorch 1.13 by mvpatel2000 in https://github.com/mosaicml/composer/pull/3395
* Check for 'CUDA error: out of memory' when auto-microbatching by JAEarly in https://github.com/mosaicml/composer/pull/3400
* Add tokens to iterations by b-chu in https://github.com/mosaicml/composer/pull/3374
* Busy wait utils in dist by dakinggg in https://github.com/mosaicml/composer/pull/3396
* Add buffering time to mlflow logger by chenmoneygithub in https://github.com/mosaicml/composer/pull/3401
* Add missing import for PyTorch 2.3.1 device mesh slicing by mvpatel2000 in https://github.com/mosaicml/composer/pull/3402
* Add pynvml to mlflow dep group by dakinggg in https://github.com/mosaicml/composer/pull/3404
* min/max flagging added to system_metrics_monitor with only non-redundant, necessary gpu metrics logged by JackZ-db in https://github.com/mosaicml/composer/pull/3373
* Simplify launcher world size parsing by mvpatel2000 in https://github.com/mosaicml/composer/pull/3398
* Optionally use `flash-attn`'s CE loss for metrics by snarayan21 in https://github.com/mosaicml/composer/pull/3394
* log image fix by jessechancy in https://github.com/mosaicml/composer/pull/3286
* [ckpt-rewr] Save state dict API by eracah in https://github.com/mosaicml/composer/pull/3372
* Revert "Optionally use `flash-attn`'s CE loss for metrics (3394)" by snarayan21 in https://github.com/mosaicml/composer/pull/3408
* CPU tests image fix by snarayan21 in https://github.com/mosaicml/composer/pull/3409
* Add setter for epoch in iteration by b-chu in https://github.com/mosaicml/composer/pull/3407
* Move pillow dep as required by mvpatel2000 in https://github.com/mosaicml/composer/pull/3412
* fixing mlflow logging to Databricks workspace file paths with /Shared/ prefix by JackZ-db in https://github.com/mosaicml/composer/pull/3410
* Bump version v0.23.3 by karan6181 in https://github.com/mosaicml/composer/pull/3414

New Contributors
* JackZ-db made their first contribution in https://github.com/mosaicml/composer/pull/3373

**Full Changelog**: https://github.com/mosaicml/composer/compare/v0.23.2...v0.23.3

0.23.2

Bug Fixes
* Fix backward compatibility issue caused by missing eval metrics class

What's Changed:
* Fix backward compatibility issue caused by missing eval metrics class by bigning in https://github.com/mosaicml/composer/pull/3385

**Full Changelog**: https://github.com/mosaicml/composer/compare/v0.23.1...release/v0.23.2

0.23.1

What's New

**1. PyTorch 2.3.1 Upgrade**

Composer now supports PyTorch 2.3.1.

What's Changed
* Torch 2.3.1 Upgrade by mvpatel2000 in https://github.com/mosaicml/composer/pull/3367
* Fix monkeypatch imports by mvpatel2000 in https://github.com/mosaicml/composer/pull/3375
* Remove unnecessary state dict and load_state_dict functions by eracah in https://github.com/mosaicml/composer/pull/3361
* Adding checkpoint backwards compatibility tests after 0.23.0 release by bigning in https://github.com/mosaicml/composer/pull/3377
* prepare_fsdp_module documentation fix by KuuCi in https://github.com/mosaicml/composer/pull/3379
* Composer version bump to v0.23.1 by snarayan21 in https://github.com/mosaicml/composer/pull/3380
* Clear caplog and use as context manager in test_logging by snarayan21 in https://github.com/mosaicml/composer/pull/3382

**Full Changelog**: https://github.com/mosaicml/composer/compare/v0.23.0...v0.23.1

0.23.0

What's New

**1. Parallelism V2 + Tensor Parallel (3335)**

Composer now supports PyTorch's implementation of [tensor parallelism](https://pytorch.org/docs/stable/distributed.tensor.parallel.html). As part of this, we've revamped and simplified how Composer does distributed training. Previously, Composer accepted a `fsdp_config` attribute in the Trainer:

trainer = Trainer(model, fsdp_config = {'sharding_strategy': 'FULL_SHARD'})

As we generalize to more forms of parallelism, we've deprecated `fsdp_config` in favor of `parallelism_config`:

trainer = Trainer(
model = model,
...
parallelism_config = {
'fsdp': {
'sharding_strategy': 'FULL_SHARD',
'data_parallel_shard_degree': 2, Size of shard dimension
'data_parallel_replicate_degree': 2, Size of replicate dimension
},
'tp_config': {
'tensor_parallel_degree': 2, Size of TP dimension
'layer_plan': ... describes how to TP layers
}
}
)

As part of this change, we now default to using DTensor for parallelism with PyTorch FSDP. PyTorch has deprecated ShardedTensor, so this migrates to the new backend which avoids various checkpointing bugs.

See the [docs](https://docs.mosaicml.com/projects/composer/en/latest/notes/distributed_training.html#tensor-parallel-tp) for tensor parallel for more information. Note that tensor parallel is still experimental and may be subject to API breaking changes. All checkpointing features may also not work with this parallelism.

**2. MLFLow API Simplification**

Previously, MLFlow logger required a tracking URI and an absolute user path when using MLFlow with Databricks:

mlflow_logger = MLFlowLogger(
tracking_uri = 'databricks',
experiment_name = '/Users/xxx.yyyzzz.com/my-first-project/'
)

trainer = Trainer(
model = model,
...
loggers = mlflow_logger,
)

Now, if you are using Databricks secrets as an environment variable, Composer will autopopulate `tracking_uri` and the `experiment_name` prefix:

trainer = Trainer(
model = model,
...
loggers = MLFlowLogger(experiment_name='my-first-project'),
)


**3. Wallclock Save Interval**

Composer now supports setting a save interval in wallclock time:

trainer = Trainer(
model = model,
...
save_interval='30m',
)

Note that most durations, such as `max_duration`, do not accept wallclock time, and the initial version of this feature is only limited to a subset of time features like `save_interval`.
Bug Fixes
* Don't close the engine if it's already closed in https://github.com/mosaicml/composer/pull/3143
* Fix HF tests with Pin in https://github.com/mosaicml/composer/pull/3248
* Fix backwards compatibility tests in https://github.com/mosaicml/composer/pull/3252
* Fix unexpected remote checkpointing downloading in https://github.com/mosaicml/composer/pull/3271
* Fix HSDP with ShardDegree < 8 in https://github.com/mosaicml/composer/pull/3313
What's Changed
* Remove CPU offload for DDP/single-gpu by mvpatel2000 in https://github.com/mosaicml/composer/pull/3242
* Adding more checkpoint backwards compatability tests by snarayan21 in https://github.com/mosaicml/composer/pull/3244
* Don't close the engine if its already closed by dakinggg in https://github.com/mosaicml/composer/pull/3143
* Replace `evaluator.dataloader.device_eval_batch_size` with `evaluator.device_eval_microbatch_size` by ShashankMosaicML in https://github.com/mosaicml/composer/pull/3247
* Fix HF tests with Pin by mvpatel2000 in https://github.com/mosaicml/composer/pull/3248
* Remove ICL metrics by mvpatel2000 in https://github.com/mosaicml/composer/pull/3243
* Add offset and length arguments for checkpoint validation functions by irenedea in https://github.com/mosaicml/composer/pull/3246
* Fix backwards compatibility tests, raise error for torch version mismatch by snarayan21 in https://github.com/mosaicml/composer/pull/3252
* Bump cryptography from 41.0.5 to 42.0.6 by dependabot in https://github.com/mosaicml/composer/pull/3256
* Bump databricks-sdk from 0.25.1 to 0.27.0 by dependabot in https://github.com/mosaicml/composer/pull/3257
* Improve GCS Object Store by mvpatel2000 in https://github.com/mosaicml/composer/pull/3251
* add retry to gcs.upload_file by bigning in https://github.com/mosaicml/composer/pull/3232
* Add unit test support for full state dict + load_weights_only and save_weights_only by eracah in https://github.com/mosaicml/composer/pull/3260
* will/bump_aws_ofi_nccl by willgleich in https://github.com/mosaicml/composer/pull/3253
* Fix daily GCS tests by mvpatel2000 in https://github.com/mosaicml/composer/pull/3268
* Fix: SAM not working with FSDP/DeepSpeed and LR scheduler. by Joqsan in https://github.com/mosaicml/composer/pull/3259
* Add upload timeout patch to mlflow on azure by dakinggg in https://github.com/mosaicml/composer/pull/3265
* Add option to stagger uploads based on local rank by dakinggg in https://github.com/mosaicml/composer/pull/3275
* explicit close by dakinggg in https://github.com/mosaicml/composer/pull/3276
* Update NCCL_ASYNC_ERROR_HANDLING env variable by priba in https://github.com/mosaicml/composer/pull/3267
* new dist_cp save planner to fix issue that each rank needs to download all checkpoint files by bigning in https://github.com/mosaicml/composer/pull/3271
* Bump to torch 2.2.2 by mvpatel2000 in https://github.com/mosaicml/composer/pull/3283
* Fix UCObjectStore.list_objects by dakinggg in https://github.com/mosaicml/composer/pull/3284
* Update peft version by dakinggg in https://github.com/mosaicml/composer/pull/3287
* replace `load_fsdp_monolith_` with `load_monolith_` by milocress in https://github.com/mosaicml/composer/pull/3288
* Return PyTorch Latest by mvpatel2000 in https://github.com/mosaicml/composer/pull/3290
* Fix daily tests by filtering a warning by mvpatel2000 in https://github.com/mosaicml/composer/pull/3291
* remove orig_params check by milocress in https://github.com/mosaicml/composer/pull/2981
* [ckpt-rewr] Get Model State Dict Util Function by eracah in https://github.com/mosaicml/composer/pull/3250
* Skip compression check with symlink files by mvpatel2000 in https://github.com/mosaicml/composer/pull/3300
* Monkeypatch Device Mesh ND Slicing by mvpatel2000 in https://github.com/mosaicml/composer/pull/3302
* Bump coverage[toml] from 7.4.4 to 7.5.1 by dependabot in https://github.com/mosaicml/composer/pull/3305
* Bump databricks-sdk from 0.27.0 to 0.27.1 by dependabot in https://github.com/mosaicml/composer/pull/3306
* Update transformers requirement from !=4.34.0,<4.41,>=4.11 to >=4.11,!=4.34.0,<4.42 by dependabot in https://github.com/mosaicml/composer/pull/3307
* Allow overwrite on upload retry in remote uploader downloader by irenedea in https://github.com/mosaicml/composer/pull/3310
* Update platform references by aspfohl in https://github.com/mosaicml/composer/pull/3304
* Fix cometml unit tests by j316chuck in https://github.com/mosaicml/composer/pull/3314
* Fix HSDP with ShardDegree < 8 by bigning in https://github.com/mosaicml/composer/pull/3313
* Update docstring for get_model_state_dict by eracah in https://github.com/mosaicml/composer/pull/3318
* Tensor Parallelism Integration by mvpatel2000 in https://github.com/mosaicml/composer/pull/3269
* Bugfixes to FSDP + TP by mvpatel2000 in https://github.com/mosaicml/composer/pull/3323
* Wct save interval by KuuCi in https://github.com/mosaicml/composer/pull/3264
* Wrap ChunkedEncodingError from UCObjectStore by irenedea in https://github.com/mosaicml/composer/pull/3321
* Add checkpoint events to mosaicml logger by b-chu in https://github.com/mosaicml/composer/pull/3316
* Bump timeout to fix daily tests by j316chuck in https://github.com/mosaicml/composer/pull/3325
* Fix FSDP ckpt by filtering User Waring by j316chuck in https://github.com/mosaicml/composer/pull/3327
* Revert TP integration by dakinggg in https://github.com/mosaicml/composer/pull/3328
* Bump databricks-sdk from 0.27.1 to 0.28.0 by dependabot in https://github.com/mosaicml/composer/pull/3331
* Bump sphinxcontrib-katex from 0.9.6 to 0.9.10 by dependabot in https://github.com/mosaicml/composer/pull/3333
* Update peft requirement from <0.11,>=0.10.0 to >=0.10.0,<0.12 by dependabot in https://github.com/mosaicml/composer/pull/3332
* Bump coverage[toml] from 7.5.1 to 7.5.2 by dependabot in https://github.com/mosaicml/composer/pull/3330
* Update protobuf requirement from <5.27 to <5.28 by dependabot in https://github.com/mosaicml/composer/pull/3329
* Improving memory snapshot by cli99 in https://github.com/mosaicml/composer/pull/3315
* Add A10 to speed monitor by mvpatel2000 in https://github.com/mosaicml/composer/pull/3336
* change ComposerModel output type by hyenal in https://github.com/mosaicml/composer/pull/3341
* Remove evaluator state by snarayan21 in https://github.com/mosaicml/composer/pull/3339
* [ckpt-rewr] Generate Metadata State Dict API by eracah in https://github.com/mosaicml/composer/pull/3311
* Tensor Parallelism v2 by mvpatel2000 in https://github.com/mosaicml/composer/pull/3335
* Migrate Type Hints for PEP 585 by mvpatel2000 in https://github.com/mosaicml/composer/pull/3344
* [checkpoint v2] add remote uploader class by bigning in https://github.com/mosaicml/composer/pull/3303
* Raise errors on all ranks for checkpoint download failures by irenedea in https://github.com/mosaicml/composer/pull/3345
* Add return type annotation when __init__ doesn't take any argument by antoinebrl in https://github.com/mosaicml/composer/pull/3347
* [ckpt-rewr] Get Optim State Dict Util API by eracah in https://github.com/mosaicml/composer/pull/3299
* Fix type check issue with device train microbatch size by mvpatel2000 in https://github.com/mosaicml/composer/pull/3349
* Add torch distributed checkpointing monkeypatches to enable TE checkpointing for extra_state attribute by j316chuck in https://github.com/mosaicml/composer/pull/3298
* Bump coverage[toml] from 7.5.2 to 7.5.3 by dependabot in https://github.com/mosaicml/composer/pull/3353
* Update wandb requirement from <0.17,>=0.13.2 to >=0.13.2,<0.18 by dependabot in https://github.com/mosaicml/composer/pull/3352
* Optional `CheckpointSaver` instantiation inside the `Trainer` by antoinebrl in https://github.com/mosaicml/composer/pull/3334
* MLFlow better experiment defaults by mvpatel2000 in https://github.com/mosaicml/composer/pull/3356
* Rename metadata keys by mvpatel2000 in https://github.com/mosaicml/composer/pull/3354
* Dataclasses for ParallelismConfig by mvpatel2000 in https://github.com/mosaicml/composer/pull/3346
* Upgrade Mofed with apt by willgleich in https://github.com/mosaicml/composer/pull/3340
* Multi gpu ci test by KuuCi in https://github.com/mosaicml/composer/pull/3312
* Autoresume Validation with Max Duration by mvpatel2000 in https://github.com/mosaicml/composer/pull/3358
* Deprecate and bump verstion to 0.23.0 by bigning in https://github.com/mosaicml/composer/pull/3359

New Contributors
* Joqsan made their first contribution in https://github.com/mosaicml/composer/pull/3259

**Full Changelog**: https://github.com/mosaicml/composer/compare/v0.22.0...v0.23.0

0.22.0

What's New

0.21.3

Bug Fixes

**1. Increased Robustness to Checkpoint Loading**

We've patched several edge cases in loading sharded checkpoints, especially with DTensors, which should decrease memory usage when loading checkpoints. We've also hardened retry logic against object cloud failure, ensuring higher robustness to transient network issues.

What's Changed
* Raise daily test timeout by mvpatel2000 in https://github.com/mosaicml/composer/pull/3172
* fix remote file naming by cli99 in https://github.com/mosaicml/composer/pull/3173
* [fix] DTensor + SHARD_GRAD_OP + use_orig_params by bigning in https://github.com/mosaicml/composer/pull/3175
* Bump db sdk by dakinggg in https://github.com/mosaicml/composer/pull/3176
* Build latest pytorch nightly images by dakinggg in https://github.com/mosaicml/composer/pull/3179
* Add FP8 TransformerEngine activation checkpointing by cli99 in https://github.com/mosaicml/composer/pull/3156
* Enabling the computation of validation loss and other metrics when using sequence parallelism by ShashankMosaicML in https://github.com/mosaicml/composer/pull/3183
* Update mosaic_fsdp_utils.py by vchiley in https://github.com/mosaicml/composer/pull/3185
* Fix the FSDP.optim_state_dict_to_load OOM by bigning in https://github.com/mosaicml/composer/pull/3184
* Revert "Update mosaic_fsdp_utils.py" by vchiley in https://github.com/mosaicml/composer/pull/3187
* Bump databricks-sdk from 0.24.0 to 0.25.1 by dependabot in https://github.com/mosaicml/composer/pull/3190
* Add version tag to local builds by mvpatel2000 in https://github.com/mosaicml/composer/pull/3188
* Update `NeptuneLogger` by AleksanderWWW in https://github.com/mosaicml/composer/pull/3165
* Filter neptune warning in doctests by mvpatel2000 in https://github.com/mosaicml/composer/pull/3195
* Removal of metrics deepcopy before computing the metrics by gregjauvion in https://github.com/mosaicml/composer/pull/3180
* Fix MLFlow Tag Name for Resumption by KuuCi in https://github.com/mosaicml/composer/pull/3194
* Fix mistral gating by dakinggg in https://github.com/mosaicml/composer/pull/3199
* Bump version to 0.21.3 by mvpatel2000 in https://github.com/mosaicml/composer/pull/3198

New Contributors
* gregjauvion made their first contribution in https://github.com/mosaicml/composer/pull/3180

**Full Changelog**: https://github.com/mosaicml/composer/compare/v0.21.2...v0.21.3

Page 3 of 15

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.