New Features
Load Checkpoint Callback (1570)
We added support for Composer's LoadCheckpoint [callback](https://github.com/mosaicml/composer/blob/28756dd52e96371689b764cb72c336406460ad35/composer/callbacks/load_checkpoint.py#L18), which loads a checkpoint at a specified event. This enables use cases like loading model base weights with peft.
callbacks:
load_checkpoint:
load_path: /path/to/your/weights
Breaking Changes
Accumulate over tokens in a Batch for Training Loss (1618,1610,1595)
We added a new flag `accumulate_train_batch_on_tokens` which specifies whether training loss is accumulated over the number of tokens in a batch, rather than the number of samples. It is true by default. This will slightly change loss curves for models trained with padding. The old behavior can be recovered by simply setting this to False explicitly.
Default Run Name (1611)
If no run name is provided, we now will default to using composer's [randomly generated run names](https://github.com/mosaicml/composer/blob/main/composer/trainer/trainer.py#L549). (Previously, we defaulted to using "llm" for the run name.)
What's Changed
* Update mcli examples to use 0.13.0 by irenedea in https://github.com/mosaicml/llm-foundry/pull/1594
* Pass accumulate_train_batch_on_tokens through to composer by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1595
* Loosen MegaBlocks version pin by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/1597
* Add configurability for hf checkpointer register timeout by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1599
* Loosen MegaBlocks to <1.0 by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/1598
* Finetuning dataloader validation tweaks by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/1600
* Bump onnx from 1.16.2 to 1.17.0 by dependabot in https://github.com/mosaicml/llm-foundry/pull/1604
* Remove TE from dockerfile and instead add as optional dependency by snarayan21 in https://github.com/mosaicml/llm-foundry/pull/1605
* Data prep on multiple GPUs by eitanturok in https://github.com/mosaicml/llm-foundry/pull/1576
* Add env var for configuring the maximum number of processes to use for dataset processing by irenedea in https://github.com/mosaicml/llm-foundry/pull/1606
* Updated error message for cluster check by nancyhung in https://github.com/mosaicml/llm-foundry/pull/1602
* Use fun default composer run names by irenedea in https://github.com/mosaicml/llm-foundry/pull/1611
* Ensure log messages are properly formatted again by snarayan21 in https://github.com/mosaicml/llm-foundry/pull/1614
* Add UC not enabled error for delta to json conversion by irenedea in https://github.com/mosaicml/llm-foundry/pull/1613
* Use a temporary directory for downloading finetuning dataset files by irenedea in https://github.com/mosaicml/llm-foundry/pull/1608
* Bump composer version to 0.26.0 by irenedea in https://github.com/mosaicml/llm-foundry/pull/1616
* Add loss generating token counts by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1610
* Change accumulate_train_batch_on_tokens default to True by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1618
* Bump version to 0.15.0.dev0 by irenedea in https://github.com/mosaicml/llm-foundry/pull/1621
* Add load checkpoint callback by irenedea in https://github.com/mosaicml/llm-foundry/pull/1570
**Full Changelog**: https://github.com/mosaicml/llm-foundry/compare/v0.13.0...v0.14.0