Axolotl

Latest version: v0.7.0

Safety actively analyzes 714772 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 2

0.7.1

What's Changed
* bump dev version by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2342
* Doc fix: TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL not necessary to use Triton kernel patches by djsaunde in https://github.com/axolotl-ai-cloud/axolotl/pull/2343
* make sure chatml dpo dataset loading works by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2333
* Fix sample packing producing longer sequences than specified by `sequence_len` by tobmi1 in https://github.com/axolotl-ai-cloud/axolotl/pull/2332
* quick formatting fix for LoRA optims doc by djsaunde in https://github.com/axolotl-ai-cloud/axolotl/pull/2349
* calculate sample length fixes and SFT splitting fixes by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2351
* feat: update transformers version to 4.49.0 by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2340
* Bumping 0.15.1 TRL version for GRPO+PEFT fix by SalmanMohammadi in https://github.com/axolotl-ai-cloud/axolotl/pull/2344
* support for passing init_lora_weights to lora_config by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2352
* fix(doc): add missing auto_find_batch_size by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2339
* don't install extraneous old version of pydantic in ci and make sre to run multigpu ci by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2355
* Relicense the logprob KD loss functions as Apache 2.0 by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2358
* Correctly reference mount paths by reissbaker in https://github.com/axolotl-ai-cloud/axolotl/pull/2347
* bump liger to 0.5.3 by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2353
* feat: add deepseek_v3 sample packing by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2230
* Feat(doc): Reorganize documentation, fix broken syntax, update notes by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2348
* Fix(doc): address missing doc changes by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2362

New Contributors
* tobmi1 made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/2332
* reissbaker made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/2347

**Full Changelog**: https://github.com/axolotl-ai-cloud/axolotl/compare/v0.7.0...v0.7.1

0.7.0

New Contributors
* NJordan72 made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/2219
* SalmanMohammadi made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/2231
* v-dicicco made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/2235
* jwongTensora made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/2257
* adi-kmt made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/2268
* mashdragon made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/2281
* erictang000 made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/2251
* leeparkuky made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/2193
* minpeter made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/2322

**Full Changelog**: https://github.com/axolotl-ai-cloud/axolotl/compare/v0.6.0...v0.7.0

0.6.0

What's Changed

0.5.2

What's Changed
* move deprecated kwargs from trainer to trainingargs by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2028
* add axolotlai docker hub org to publish list by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2031
* update actions version for node16 deprecation by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2037
* replace references to personal docker hub to org docker hub by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2036
* feat: add metharme chat_template by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2033
* change deprecated Stub to App by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2038
* fix: handle sharegpt dataset missing by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2035
* add P2P env when multi-gpu but not the full node by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2041
* invert the string in string check for p2p device check by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2044
* feat: print out dataset length even if not preprocess by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2034
* Add example YAML file for training Mistral using DPO by olivermolenschot in https://github.com/axolotl-ai-cloud/axolotl/pull/2029
* fix: inference not using chat_template by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2019
* feat: cancel ongoing tests if new CI is triggered by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2046
* feat: upgrade to liger 0.4.1 by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2045
* run pypi release action on tag create w version by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2047
* make sure to tag images in docker for tagged releases by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2051
* retry flaky test_packing_stream_dataset test that timesout on read by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2052
* install default torch version if not already, new xformers wheels for torch 2.5.x by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2049
* fix push to main and tag semver build for docker ci by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2054
* Update unsloth for torch.cuda.amp deprecation by bursteratom in https://github.com/axolotl-ai-cloud/axolotl/pull/2042
* don't cancel the tests on main automatically for concurrency by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2055
* ADOPT optimizer integration by bursteratom in https://github.com/axolotl-ai-cloud/axolotl/pull/2032
* Grokfast support by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1917
* upgrade to flash-attn 2.7.0 by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2048
* make sure to add tags for versioned tag on cloud docker images by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2060
* fix duplicate base build by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2061
* fix env var extraction by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2043
* gradient accumulation tests, embeddings w pad_token fix, smaller models by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2059
* upgrade datasets==3.1.0 and add upstream check by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2067
* update to be deprecated evaluation_strategy by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1682
* remove the bos token from dpo outputs by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1733
* support passing trust_remote_code to dataset loading by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2050
* support for schedule free and e2e ci smoke test by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2066
* Fsdp grad accum monkeypatch by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2064
* fix: loading locally downloaded dataset by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2056
* Update `get_unpad_data` patching for multipack by chiragjn in https://github.com/axolotl-ai-cloud/axolotl/pull/2013
* increase worker count to 8 for basic pytests by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2075
* upgrade autoawq==0.2.7.post2 for transformers fix by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2070
* optim e2e tests to run a bit faster by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2069
* don't build bdist by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2076
* static assets, readme, and badges update v1 by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2077
* Readme updates v2 by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2078
* bump transformers for fsdp-grad-accum fix, remove patch by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2079
* Feat: Drop long samples and shuffle rl samples by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2040
* add optimizer step to prevent warning in tests by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1502
* fix brackets on docker ci builds, add option to skip e2e builds by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2080
* remove deprecated extra metadata kwarg from pydantic Field by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2081
* release version 0.5.1 by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2082
* make sure action has permission to create release by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2083
* set manifest and fix for source dist by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2084
* add missing dunder-init for monkeypatches and add tests for install from sdist by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2085

New Contributors
* olivermolenschot made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/2029

**Full Changelog**: https://github.com/axolotl-ai-cloud/axolotl/compare/v0.5.0...v0.5.2

0.5.0

What's Changed
* fix(log): improve warning to clarify that lora_modules_to_save expect a list by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/1197
* Add: colab example by JohanWork in https://github.com/axolotl-ai-cloud/axolotl/pull/1196
* Feat/chatml add system message by mhenrichsen in https://github.com/axolotl-ai-cloud/axolotl/pull/1117
* fix learning rate scheduler's warnings by RicardoDominguez in https://github.com/axolotl-ai-cloud/axolotl/pull/1135
* precompute dpo logprobs setting and fixes by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1199
* Update deps 202401 by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1204
* make sure to register the base chatml template even if no system message is provided by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1207
* workaround for transformers bug requireing do_sample for saveing pretrained by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1206
* more checks and fixes for deepspeed and fsdp by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1208
* drop py39 docker images, add py311, upgrade pytorch to 2.1.2 by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1205
* Update qlora.yml - DeprecationWarning: `max_packed_sequence_len` is n… by 7flash in https://github.com/axolotl-ai-cloud/axolotl/pull/1210
* Respect sliding_window=None by DreamGenX in https://github.com/axolotl-ai-cloud/axolotl/pull/1214
* ensure the tests use the same version of torch as the latest base docker images by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1215
* ADD: warning if hub_model_id ist set but not any save strategy by JohanWork in https://github.com/axolotl-ai-cloud/axolotl/pull/1202
* run PR e2e docker CI tests in Modal by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1217
* Revert "run PR e2e docker CI tests in Modal" by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1220
* FEAT: add tagging support to axolotl for DPOTrainer by filippo82 in https://github.com/axolotl-ai-cloud/axolotl/pull/1209
* Peft lotfq by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1222
* Fix typos (pretained -> pretrained) by xhedit in https://github.com/axolotl-ai-cloud/axolotl/pull/1231
* Fix and document test_datasets by DreamGenX in https://github.com/axolotl-ai-cloud/axolotl/pull/1228
* set torch version to what is installed during axolotl install by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1234
* Cloud motd by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1235
* [Nit] Fix callout by hamelsmu in https://github.com/axolotl-ai-cloud/axolotl/pull/1237
* Support for additional_special_tokens by DreamGenX in https://github.com/axolotl-ai-cloud/axolotl/pull/1221
* Peft deepspeed resume by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1227
* support for true batches with multipack by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1230
* add contact info for dedicated support for axolotl by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1243
* fix(model): apply gate fp32 only for mixtral by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/1241
* relora: magnitude pruning of the optimizer by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1245
* Pretrain transforms by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1261
* Fix typo `bloat16` -> `bfloat16` by chiragjn in https://github.com/axolotl-ai-cloud/axolotl/pull/1257
* Add more save strategies for DPO training. by PhilipMay in https://github.com/axolotl-ai-cloud/axolotl/pull/1255
* BUG FIX: lock pytorch version in colab example by JohanWork in https://github.com/axolotl-ai-cloud/axolotl/pull/1247
* Fix typo preventing `model_kwargs` being injected by zacbrannelly in https://github.com/axolotl-ai-cloud/axolotl/pull/1262
* contributor avatars by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1269
* simplify haldning for newer multipack patches so they can be added in a single place by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1270
* Add link to axolotl cloud image on latitude by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1275
* copy edits by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1276
* allow remote data paths by hamelsmu in https://github.com/axolotl-ai-cloud/axolotl/pull/1278
* add support for https remote yamls by hamelsmu in https://github.com/axolotl-ai-cloud/axolotl/pull/1277
* run the docker image builds and push on gh action gpu runners by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1218
* Update README.md by hamelsmu in https://github.com/axolotl-ai-cloud/axolotl/pull/1281
* don't use load and push together by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1284
* Add MPS support by maximegmd in https://github.com/axolotl-ai-cloud/axolotl/pull/1264
* allow the optimizer prune ration for relora to be configurable by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1287
* Scheduler implementation of Continual Pre-Training of Large Language Models: How to (re)warm your model? by jinwonkim93 in https://github.com/axolotl-ai-cloud/axolotl/pull/1273
* Add seq2seq eval benchmark callback by LeonardoEmili in https://github.com/axolotl-ai-cloud/axolotl/pull/1274
* Validation always happens on first step by LeonardoEmili in https://github.com/axolotl-ai-cloud/axolotl/pull/1300
* fix(examples): remove is_*_derived as it's parsed automatically by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/1297
* Allow load_best_model_at_end to be configured for early stopping on custom evaluation datasets by dameikle in https://github.com/axolotl-ai-cloud/axolotl/pull/1291
* Add instructions for playing with qlora model to colab example by jaredpalmer in https://github.com/axolotl-ai-cloud/axolotl/pull/1290
* fix(readme): update inference md link by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/1311
* Adding Google's gemma Model by monk1337 in https://github.com/axolotl-ai-cloud/axolotl/pull/1312
* multipack for gemma by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1313
* deprecate: pytorch 2.0.1 image by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/1315
* fix(readme): Clarify doc for tokenizer_config by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/1323
* [bug-report template] Use yaml codeblock for config.yaml field by kallewoof in https://github.com/axolotl-ai-cloud/axolotl/pull/1303
* make mlflow optional by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1317
* Pydantic 2.x cfg by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1239
* chore: update readme to be more clear by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/1326
* ADD: push checkpoints to mlflow artifact registry by JohanWork in https://github.com/axolotl-ai-cloud/axolotl/pull/1295
* hotfix for capabilities loading by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1331
* hotfix for lora rank by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1332
* hotfix for missing outputs params by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1333
* hotfix to exclude_unset from pydantic config when converting back to a dict by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1334
* Add StableLM 2 Example Scripts by ncoop57 in https://github.com/axolotl-ai-cloud/axolotl/pull/1327
* add lion-pytorch optimizer by maximegmd in https://github.com/axolotl-ai-cloud/axolotl/pull/1299
* Support user-defined prompt processing strategies for dpo by nopperl in https://github.com/axolotl-ai-cloud/axolotl/pull/1248
* more pydantic fixes by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1338
* Mps mistral lora by maximegmd in https://github.com/axolotl-ai-cloud/axolotl/pull/1292
* fix: checkpoint saving with deepspeed by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/1321
* Update debugging.md by hamelsmu in https://github.com/axolotl-ai-cloud/axolotl/pull/1339
* fix steps check for anneal on first cycle by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1316
* Update fastchat_conversation_turns.py by eltociear in https://github.com/axolotl-ai-cloud/axolotl/pull/1294
* add gemma instruct chat template by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1341
* more fixes 20240228 by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1342
* deprecate py 3.9 support, set min pytorch version by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1343
* Fix `use_mlflow` to be bool instead of str by chiragjn in https://github.com/axolotl-ai-cloud/axolotl/pull/1344
* fix for protected model_ namespace w pydantic by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1345
* run tests again on Modal by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1289
* chore: enable sample_packing for Gemma [skip ci] by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/1351
* Fix validation for early stopping by chiragjn in https://github.com/axolotl-ai-cloud/axolotl/pull/1358
* plain input/output prompt strategy w/o chat templates by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1346
* lora+ support by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1352
* allow the sharegpt handler to also better handle datasets destined for openai finetuning by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1361
* Update tinyllama lora.yml to fix eval packing issue by rasbt in https://github.com/axolotl-ai-cloud/axolotl/pull/1362
* add starcoder2 by ehartford in https://github.com/axolotl-ai-cloud/axolotl/pull/1349
* Fix supported python versions in README, as python 3.9 was recently deprecated by nirogu in https://github.com/axolotl-ai-cloud/axolotl/pull/1364
* support for DoRA w/ PEFT by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1363
* add docs for `input_output` format by hamelsmu in https://github.com/axolotl-ai-cloud/axolotl/pull/1367
* update flash attention for gemma support by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1368
* JarvisLabs by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1372
* FDSP + QLoRA by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1378
* validation for fsdp and deepspeed by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1388
* support for rslora by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1387
* Fix pydantic configuration for the max_memory input by dandm1 in https://github.com/axolotl-ai-cloud/axolotl/pull/1385
* Set `gradient_clipping` to `auto` in DeepSpeed configs by seungduk-yanolja in https://github.com/axolotl-ai-cloud/axolotl/pull/1382
* Add Glaive conversation format support by brianfitzgerald in https://github.com/axolotl-ai-cloud/axolotl/pull/1365
* chore: lint by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1389
* add handling for argilla dpo-mix by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1397
* Update ChatTemplate enum to include alpaca and gemma by chiragjn in https://github.com/axolotl-ai-cloud/axolotl/pull/1396
* Add QLoRA + FSDP Docs by hamelsmu in https://github.com/axolotl-ai-cloud/axolotl/pull/1403
* Don't disable existing loggers when configuring axolotl logging by chiragjn in https://github.com/axolotl-ai-cloud/axolotl/pull/1395
* Train parameters exclusively in specific ranges by seungduk-yanolja in https://github.com/axolotl-ai-cloud/axolotl/pull/1390
* Fix Gemma 7b qlora.yml by rasbt in https://github.com/axolotl-ai-cloud/axolotl/pull/1405
* beta support for multipack with gemmoe by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1402
* Feat(readme): Add instructions for Google GPU VM instances by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/1410
* Fix(readme): Improve README QuickStart info by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/1408
* chore(script): remove redundant setting by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/1411
* Add Phorm AI Badge (Morph Labs) by bentleylong in https://github.com/axolotl-ai-cloud/axolotl/pull/1418
* ORPO by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1419
* fix(config): passing gradient_checkpoint_kwargs by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/1412
* Add a config not to shuffle merged dataset by seungduk-yanolja in https://github.com/axolotl-ai-cloud/axolotl/pull/1394
* Feat: Add sharegpt multirole by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/1137
* support galore once upstreamed into transformers by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1409
* fixes for dpo and orpo template loading by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1424
* HF / FEAT: Optimize HF tags by younesbelkada in https://github.com/axolotl-ai-cloud/axolotl/pull/1425
* strip out hacky qlora-fsdp workarounds now that qlora-fsdp fixes are upstreamed by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1428
* Bootstrap Hosted Axolotl Docs w/Quarto by hamelsmu in https://github.com/axolotl-ai-cloud/axolotl/pull/1429
* Orpo fix wip by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1433
* chore(config): refactor old mistral config by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/1435
* docs: update link to docs of advanced topics in README.md by pphuc25 in https://github.com/axolotl-ai-cloud/axolotl/pull/1437
* fix(dataset): normalize tokenizer config and change hash from tokenizer class to tokenizer path by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/1298
* make sure to capture non-null defaults from config validation by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1415
* Turn on sample_packing for Gemma training by satpalsr in https://github.com/axolotl-ai-cloud/axolotl/pull/1438
* Fix falcon tokenization step by pharaouk in https://github.com/axolotl-ai-cloud/axolotl/pull/1441
* Remove seq_len arg in rotary_emb by BMPixel in https://github.com/axolotl-ai-cloud/axolotl/pull/1443
* fix for accelerate env var for auto bf16, add new base image and expand torch_cuda_arch_list support by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1413
* support layer replication for peft and fix rslora integration by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1445
* fix layer_replication arg to peft by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1446
* Jamba by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1451
* Support loading datasets saved via save_to_disk by fozziethebeat in https://github.com/axolotl-ai-cloud/axolotl/pull/1432
* fix some of the edge cases for Jamba by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1452
* configure nightly docker builds by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1454
* fix how nightly tag is generated by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1456
* fix yaml parsing for workflow by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1457
* Nightlies fix v4 by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1458
* qwen2_moe support w multipack by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1455
* make sure to install causal_conv1d in docker by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1459
* Lisa by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1469
* feat: add deepspeed 3 with cpuoffload by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/1466
* reduce verbosity of the special tokens by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1472
* Reorganize Docs by hamelsmu in https://github.com/axolotl-ai-cloud/axolotl/pull/1468
* fix pretraining_ on odd datasets by mapmeld in https://github.com/axolotl-ai-cloud/axolotl/pull/1463
* Added pip install ninja to accelerate installation of flash-attn by melvinebenezer in https://github.com/axolotl-ai-cloud/axolotl/pull/1461
* Pretrain multipack v2 by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1470
* Feat: update doc by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/1475
* refactor utils.data module for line count linter by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1476
* don't use deepspeed or fsdp when merging loras by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1479
* add support for cohere chat template by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1478
* feat: validate sample packing requires flash_attention by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/1465
* fix: reduce sample_packing FA error to warning by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/1484
* drop empty token from beginning if tokenizer has no bos_token (in the case of qwen) by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1490
* Remove `validate_quantized_dora` by xzuyn in https://github.com/axolotl-ai-cloud/axolotl/pull/1485
* ignore issues with calculating params when printing by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1493
* add field to sft dataset pydantic for completion support by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1497
* Fix the wrong adapter in qwen2-moe-qlora example by maziyarpanahi in https://github.com/axolotl-ai-cloud/axolotl/pull/1501
* Print versions by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1496
* Correctly handle splits for datasets.arrow_dataset.Dataset objects by scottfleming in https://github.com/axolotl-ai-cloud/axolotl/pull/1504
* WIP: Support table logging for mlflow, too by DavidFarago in https://github.com/axolotl-ai-cloud/axolotl/pull/1506
* use locale agnostic seperator to make large nums easier to read by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1503
* Update SaveAxolotlConfigtoWandBCallback to use artifact instead of save by tcapelle in https://github.com/axolotl-ai-cloud/axolotl/pull/1483
* DBRX Model Support by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1462
* Unsloth gradient checkpointing offload by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1528
* add docs around pre-processing by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1529
* Update README.md by emilytin0206 in https://github.com/axolotl-ai-cloud/axolotl/pull/1521
* Update Readme to include support for Mixtral8X22B by Barbarian7676 in https://github.com/axolotl-ai-cloud/axolotl/pull/1518
* Create mixtral_22.yml by Barbarian7676 in https://github.com/axolotl-ai-cloud/axolotl/pull/1514
* feat(doc): Add config example for pad_token by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/1535
* llama-3 examples by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1537
* Adding Llama-3 qlora by monk1337 in https://github.com/axolotl-ai-cloud/axolotl/pull/1536
* fix broken linting by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1541
* fix(packages): lock datasets version by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/1545
* fix(yml): update llama-3 config by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/1543
* ORPO Trainer replacement by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1551
* wrap prepared_ds_path in str() to avoid TypeError in fsspec package by FrankRuis in https://github.com/axolotl-ai-cloud/axolotl/pull/1548
* Add support for Gemma chat template by Haoxiang-Wang in https://github.com/axolotl-ai-cloud/axolotl/pull/1530
* make sure everything stays in the same dtype when using dpo + FSDP by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1559
* Add ORPO example and e2e test by tokestermw in https://github.com/axolotl-ai-cloud/axolotl/pull/1572
* Pose context length ext by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1567
* chore: clarify microbatch size by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/1579
* Add debug option for RL dataset preprocessing by abhinand5 in https://github.com/axolotl-ai-cloud/axolotl/pull/1404
* ADD: warning hub model by JohanWork in https://github.com/axolotl-ai-cloud/axolotl/pull/1301
* FIX: TRL trainer preprocessing step was running in one process by ali-mosavian in https://github.com/axolotl-ai-cloud/axolotl/pull/1583
* Pass weakref to model in the SIGINT handler to free up model post train function by chiragjn in https://github.com/axolotl-ai-cloud/axolotl/pull/1581
* improve save callbacks by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1592
* fix for jupyterlab on cloud start by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1594
* add torch 2.3.0 to builds by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1593
* docs(config.qmd): add loraplus example by tpoisonooo in https://github.com/axolotl-ai-cloud/axolotl/pull/1577
* Gradio configuration parameters by marijnfs in https://github.com/axolotl-ai-cloud/axolotl/pull/1591
* Pass `deepspeed` and `fsdp` as `None` explicitly when merging adapters to allow custom device_map by chiragjn in https://github.com/axolotl-ai-cloud/axolotl/pull/1575
* feat: exclude mamba blocks for jamba when load8bit by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/1578
* improve tool handling roles by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1587
* make sure to save the lora adapter at the end of RL/dpo training by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1573
* ignore the fsdp_config section too by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1606
* adding llama3 fastchat conversation monkeypatch by TJ-Solergibert in https://github.com/axolotl-ai-cloud/axolotl/pull/1539
* feat: Add LLaMA-3 instruct prompt strategies for fine-tuning by 0-hero in https://github.com/axolotl-ai-cloud/axolotl/pull/1553
* Llama3 dpo by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1610
* add dstack section by deep-diver in https://github.com/axolotl-ai-cloud/axolotl/pull/1612
* fix attention mask collation by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1603
* make sure to save on the last step by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1615
* FIX: max_length and max_prompt_length was not being sent to ORPOTrainer by ali-mosavian in https://github.com/axolotl-ai-cloud/axolotl/pull/1584
* Fix `total_num_steps` by bofenghuang in https://github.com/axolotl-ai-cloud/axolotl/pull/1566
* update torch 2.2.1 -> 2.2.2 by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1622
* update outputs path so that we can mount workspace to /workspace/data by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1623
* bump versions of deps by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1621
* fix symlinks for axolotl outputs by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1625
* fix setting the authorized keys when there are more than one in the env var by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1626
* install rsync too by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1627
* cloud image w/o tmux by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1628
* more fixes to work with runpod + skypilot by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1629
* fix ray install by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1630
* add save_only_model option by jquesnelle in https://github.com/axolotl-ai-cloud/axolotl/pull/1634
* Unsloth optims for Llama by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1609
* fixes to save on fractional save_steps by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1643
* Add KTO support by benredmond in https://github.com/axolotl-ai-cloud/axolotl/pull/1640
* Fix llama3 chat_template (extra <|eot_id|> on last turn) by lhl in https://github.com/axolotl-ai-cloud/axolotl/pull/1635
* allow report_to for multiple providers by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1647
* Enable LoRA+ setting for dpo trainer by thepowerfuldeez in https://github.com/axolotl-ai-cloud/axolotl/pull/1646
* Update tiny-llama qlora.yml addressing eval packing error by jaydeepthik in https://github.com/axolotl-ai-cloud/axolotl/pull/1638
* support for custom messages field in sharegpt by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1651
* Switch to parallel FFD bin packing algorithm. by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1619
* document how to use `share_strategy="no"` by charlesfrye in https://github.com/axolotl-ai-cloud/axolotl/pull/1653
* update deps by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1663
* Fix Google Colab notebook 2024-05 by maciejgryka in https://github.com/axolotl-ai-cloud/axolotl/pull/1662
* Generalizing the chat_template prompt strategy by fozziethebeat in https://github.com/axolotl-ai-cloud/axolotl/pull/1660
* Fix Lora config error for Llama3 by oaishi in https://github.com/axolotl-ai-cloud/axolotl/pull/1659
* fix lint issue that snuck through by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1665
* Fix: ensure correct handling of `val_set_size` as `float` or `int` by davidecaroselli in https://github.com/axolotl-ai-cloud/axolotl/pull/1655
* Correct name of MixtralBlockSparseTop2MLP (L -> l) by seungduk-yanolja in https://github.com/axolotl-ai-cloud/axolotl/pull/1667
* Fix README quick start example usage model dirs by abevoelker in https://github.com/axolotl-ai-cloud/axolotl/pull/1668
* make sure the CI fails when pytest script fails by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1669
* handle the system role too for chat templates by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1671
* revert multipack batch sampler changes by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1672
* re-enable phi for tests in modal ci by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1373
* use mixins for orpo and kto configs so they work with axolotl customi zations by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1674
* set chat_template in datasets config automatically by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1664
* load explicit splits on datasets by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1652
* cleanup the deepspeed proxy model at the end of training by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1675
* need to add back drop_last for sampler by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1676
* Fix the broken link in README by saeedesmaili in https://github.com/axolotl-ai-cloud/axolotl/pull/1678
* re-enable DPO for tests in modal ci by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1374
* add support for rpo_alpha by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1681
* Phi-3 conversation format, example training script and perplexity metric by brianfitzgerald in https://github.com/axolotl-ai-cloud/axolotl/pull/1582
* Adding Phi-3 model by monk1337 in https://github.com/axolotl-ai-cloud/axolotl/pull/1580
* ensure explicit eval_sample_packing to avoid mismatch issues by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1692
* add qwen2-72b fsdp example by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1696
* add back packing efficiency estimate so epochs and multi-gpu works properly by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1697
* Sample packing eval fix by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1695
* bump deepspeed for fix for grad norm compute putting tensors on different devices by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1699
* verbose failure message by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1694
* download model weights on preprocess step by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1693
* drop length column for issues with eval without packing by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1711
* add support for multipack for deepseek_v2 by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1712
* Allow "weight: 0" in messages to mask them by DavidFarago in https://github.com/axolotl-ai-cloud/axolotl/pull/1703
* improve Pre-Tokenized Dataset docs by josharian in https://github.com/axolotl-ai-cloud/axolotl/pull/1684
* support for gemma2 w sample packing by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1718
* add support for .env files for env vars by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1724
* full weights fsdp training seems broken with fsdp_cpu_ram_efficient_loading by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1726
* sanity check ranges in freeze.py by josharian in https://github.com/axolotl-ai-cloud/axolotl/pull/1686
* bump trl and accelerate for latest releases by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1730
* Fixes the urls after org move by mhenrichsen in https://github.com/axolotl-ai-cloud/axolotl/pull/1734
* add tests so CI can catch updates where patches will break with unsloth by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1737
* typo by Klingefjord in https://github.com/axolotl-ai-cloud/axolotl/pull/1685
* add torch 2.3.1 base image by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1745
* fixes to prevent vram spike when train starts by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1742
* update to pytorch 2.3.1 by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1746
* bump xformers to 0.0.27 by akshaylive in https://github.com/axolotl-ai-cloud/axolotl/pull/1740
* Changed URL for dataset docs by dameikle in https://github.com/axolotl-ai-cloud/axolotl/pull/1744
* Fix eval_sample_packing in llama-3 lora example by RodriMora in https://github.com/axolotl-ai-cloud/axolotl/pull/1716
* bump flash attention 2.5.8 -> 2.6.1 by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1738
* add basic support for the optimi adamw optimizer by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1727
* update modal package and don't cache pip install by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1757
* torch compile and cuda alloc improvements by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1755
* support for llama multipack using updated code/patches by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1754
* fix num gpu check by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1760
* fixes to accelerator so that iterable pretraining datasets work by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1759
* add torch_compile_mode options by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1763
* re-enable PYTORCH_CUDA_ALLOC_CONF expandable_segments by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1765
* set the number of dataset processes on the DPO Config rather than the trainer by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1762
* Unsloth rope by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1767
* bump transformers and set roundup_power2_divisions for more VRAM improvements, low bit ao optimizers by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1769
* Fix untrained tokens by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1771
* Add a `chat_template` prompt strategy for DPO by fozziethebeat in https://github.com/axolotl-ai-cloud/axolotl/pull/1725
* swaps to use newer sample packing for mistral by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1773
* bump transformers for updated llama 3.1 by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1778
* bump flash attention to 2.6.2 by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1781
* fix fsdp loading of models, esp 70b by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1780
* add support for simpo via cpo trainer by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1772
* Bump deepspeed 20240727 by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1790
* various batch of fixes by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1785
* Add flexible configuration options for `chat_template` dataset training by Tostino in https://github.com/axolotl-ai-cloud/axolotl/pull/1756
* Update README.md by mhenrichsen in https://github.com/axolotl-ai-cloud/axolotl/pull/1792
* move to supporting mostly 12.1 w 2.3.1 and add new 12.4 with 2.4.0 by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1793
* fix dockerfile and base builder by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1795
* use 12.4.1 instead of 12.4 [skip-ci] by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1796
* update test and main/nightly builds by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1797
* publish axolotl images without extras in the tag name by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1798
* qlora-fsdp ram efficient loading with hf trainer by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1791
* fix roles to train defaults and make logging less verbose by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1801
* Fix colab example notebook by srib in https://github.com/axolotl-ai-cloud/axolotl/pull/1805
* Fix setting correct repo id when pushing dataset to hub by chrislee973 in https://github.com/axolotl-ai-cloud/axolotl/pull/1657
* Update instruct-lora-8b.yml by monk1337 in https://github.com/axolotl-ai-cloud/axolotl/pull/1789
* Update conversation.qmd by penfever in https://github.com/axolotl-ai-cloud/axolotl/pull/1788
* One cycle lr by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1803
* remove un-necessary zero-first guard as it's already called in a parent fn by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1810
* set z3 leaf for deepseek v2 by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1809
* logging improvements by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1808
* update peft and transformers by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1811
* skip no commit to main on ci by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1814
* fix z3 leaf configuration when not using lists by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1817
* update tinyllama to use final instead of checkpoints [skip ci] by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1820
* Attempt to run multigpu in PR CI for now to ensure it works by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1815
* fix the incorrect `max_length` for chat template by chiwanpark in https://github.com/axolotl-ai-cloud/axolotl/pull/1818
* bump hf dependencies by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1823
* fix: parse eager_attention from cfg by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/1824
* fix: parse model_kwargs by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/1825
* update sklearn versrion, torch compile env vars, don't worry about failure on preprocess load model by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1821
* add validation to prevent 8bit lora finetuning on H100s by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1827
* optionally save the final FSDP model as a sharded state dict by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1828
* fix: dont change quant storage dtype in case of fsdp by xgal in https://github.com/axolotl-ai-cloud/axolotl/pull/1837
* pretrain: fix with sample_packing=false by tmm1 in https://github.com/axolotl-ai-cloud/axolotl/pull/1841
* feat: add jamba chat_template by xgal in https://github.com/axolotl-ai-cloud/axolotl/pull/1843
* examples: fix tiny-llama pretrain yml syntax by tmm1 in https://github.com/axolotl-ai-cloud/axolotl/pull/1840
* rename jamba example by xgal in https://github.com/axolotl-ai-cloud/axolotl/pull/1846
* numpy 2.1.0 was released, but incompatible with numba by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1849
* ensure that the bias is also in the correct dtype by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1848
* make the train_on_eos default to turn so all eos tokens are treated the same by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1847
* fix: prompt phi by JohanWork in https://github.com/axolotl-ai-cloud/axolotl/pull/1845
* docs: minor syntax highlight fix by tmm1 in https://github.com/axolotl-ai-cloud/axolotl/pull/1839
* ensure that the hftrainer deepspeed config is set before the trainer class is ever init'ed by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1850
* run nightly ci builds against upstream main by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1851
* rename nightly test and add badge by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1853
* most model types now support flash attention 2 regardless of multipack support by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1854
* add axolotl community license by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1862
* don't mess with bnb since it needs compiled wheels by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1859
* Liger Kernel integration by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1861
* add liger example by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1864
* add liger to readme by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1865
* change up import to prevent AttributeError by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1863
* simplify logic by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1856
* better handling of llama-3 tool role by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1782
* Spectrum plugin by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1866
* update specturm authors by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1869
* Fix `drop_long_seq` bug due to truncation in prompt tokenization strategies when using `chat_template` by chiwanpark in https://github.com/axolotl-ai-cloud/axolotl/pull/1867
* clear cuda cache to help with memory leak/creep by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1858
* Add Liger Kernal support for Qwen2 by chiwanpark in https://github.com/axolotl-ai-cloud/axolotl/pull/1871
* Sample pack trust remote code v2 by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1873
* monkey-patch transformers to simplify monkey-patching modeling code by tmm1 in https://github.com/axolotl-ai-cloud/axolotl/pull/1877
* fix liger plugin load issues by tmm1 in https://github.com/axolotl-ai-cloud/axolotl/pull/1876
* deepseekv2 liger support by tmm1 in https://github.com/axolotl-ai-cloud/axolotl/pull/1878
* Add liger kernel to features section by ByronHsu in https://github.com/axolotl-ai-cloud/axolotl/pull/1881
* pin liger-kernel to latest 0.2.1 by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1882
* Update supported models for Liger Kernel by DocShotgun in https://github.com/axolotl-ai-cloud/axolotl/pull/1875
* run pytests with varied pytorch versions too by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1883
* Fix RMSNorm monkey patch for Gemma models by chiwanpark in https://github.com/axolotl-ai-cloud/axolotl/pull/1886
* add e2e smoke tests for llama liger integration by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1884
* support for auto_find_batch_size when packing by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1885
* fix optimizer + fsdp combination in example [skip ci] by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1893
* Docs for AMD-based HPC systems by tijmen in https://github.com/axolotl-ai-cloud/axolotl/pull/1891
* lint fix and update gha regex by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1899
* Fix documentation for pre-tokenized dataset by alpayariyak in https://github.com/axolotl-ai-cloud/axolotl/pull/1894
* fix zero3 integration by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1897
* bump accelerate to 0.34.2 by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1901
* remove dynamic module loader monkeypatch as this was fixed upstream by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1914
* Trigger the original tokenization behavior when no advanced turn settings are provided by fozziethebeat in https://github.com/axolotl-ai-cloud/axolotl/pull/1915
* validation fixes 20240923 by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1925
* update upstream deps versions and replace lora+ by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1928
* fix for empty lora+ lr embedding by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1932
* bump transformers to 4.45.1 by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1936
* Multimodal Vision Llama - rudimentary support by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1940
* add 2.4.1 to base models by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1953
* upgrade pytorch from 2.4.0 => 2.4.1 by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1950
* fix(log): update perplexity log to clarify from eval split by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/1952
* Fix type annotations in relora.py by bxptr in https://github.com/axolotl-ai-cloud/axolotl/pull/1941
* Comet integration by Lothiraldan in https://github.com/axolotl-ai-cloud/axolotl/pull/1939
* Fixing/Adding Mistral Templates by pandora-s-git in https://github.com/axolotl-ai-cloud/axolotl/pull/1927
* lm_eval harness post train by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1926
* Axo logo new by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1956
* Add Support for `revision` Dataset Parameter to specify reading from Huggingface Dataset Revision by thomascleberg in https://github.com/axolotl-ai-cloud/axolotl/pull/1912
* Add MLFlow run name option in config by awhazell in https://github.com/axolotl-ai-cloud/axolotl/pull/1961
* add warning that sharegpt will be deprecated by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1957
* Handle image input as string paths for MMLMs by afrizalhasbi in https://github.com/axolotl-ai-cloud/axolotl/pull/1958
* update hf deps by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1964
* only install torchao for torch versions >= 2.4.0 by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1963
* Fixing Validation - Mistral Templates by pandora-s-git in https://github.com/axolotl-ai-cloud/axolotl/pull/1962
* fix(doc): update eval causal lm metrics doc to add perplexity by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/1951
* Add support for qwen 2.5 chat template by amazingvince in https://github.com/axolotl-ai-cloud/axolotl/pull/1934
* wip add new proposed message structure by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1904
* Reward model by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1879
* add ds zero3 to multigpu biweekly tests by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1900
* upgrade accelerate to 1.0.1 by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1969
* examples: Fix config llama3 by JohanWork in https://github.com/axolotl-ai-cloud/axolotl/pull/1833
* also debug if other debug args are set by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1977
* memoize dataset length for eval sample packing by bursteratom in https://github.com/axolotl-ai-cloud/axolotl/pull/1974
* add pytorch 2.5.0 base images by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1979
* first pass at pytorch 2.5.0 support by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1982
* fix builds so pytorch version isn't clobbered by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1986
* use torch 2.4.1 images as latest now that torch 2.5.0 is out by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1987
* Log checkpoints as mlflow artifacts by awhazell in https://github.com/axolotl-ai-cloud/axolotl/pull/1976
* revert image tagged as main-latest by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1990
* Refactor func load_model to class ModelLoader by MengqingCao in https://github.com/axolotl-ai-cloud/axolotl/pull/1909
* Fix: Gradient Accumulation issue by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/1980
* fix zero3 by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1994
* add option for resizing embeddings when adding new tokens by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2000
* Feat: Add support for tokenizer’s or custom jinja chat_template by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/1970
* Hardware requirements by OliverKunc in https://github.com/axolotl-ai-cloud/axolotl/pull/1997
* feat: update yml chat_template to specify dataset field by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2001
* remove skipped test by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2002
* feat: add Exaone3 chat_template by shing100 in https://github.com/axolotl-ai-cloud/axolotl/pull/1995
* Fix get_chat_template call for trainer builder by chiragjn in https://github.com/axolotl-ai-cloud/axolotl/pull/2003
* Fix: modelloader handling of model_kwargs load_in*bit by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/1999
* Add plugin manager's callback hooks to training flow by chiragjn in https://github.com/axolotl-ai-cloud/axolotl/pull/2006
* add retries for load datasets requests failures by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2007
* Base 2 5 1 by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2010
* only run the remainder of the gpu test suite if one case passes first by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2009
* upgrade liger to 0.4.0 by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1973
* janky workaround to install FA2 on torch 2.5.1 base image since it takes forever to build by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2022
* upgrade pytorch to 2.5.1 by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2024
* Add weighted optimisation support for trl DPO trainer integration by bursteratom in https://github.com/axolotl-ai-cloud/axolotl/pull/2016
* remove fastchat and sharegpt by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2021
* increment version to 0.5.0 for next release by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2025
* make publish to pypi manually dispatchable as a workflow by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2026
* remove unused direct dependency on fused dense lib by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2027

New Contributors
* 7flash made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1210
* DreamGenX made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1214
* filippo82 made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1209
* xhedit made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1231
* chiragjn made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1257
* PhilipMay made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1255
* zacbrannelly made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1262
* LeonardoEmili made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1274
* dameikle made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1291
* jaredpalmer made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1290
* monk1337 made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1312
* ncoop57 made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1327
* nopperl made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1248
* rasbt made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1362
* nirogu made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1364
* dandm1 made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1385
* brianfitzgerald made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1365
* bentleylong made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1418
* pphuc25 made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1437
* satpalsr made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1438
* pharaouk made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1441
* BMPixel made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1443
* fozziethebeat made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1432
* mapmeld made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1463
* melvinebenezer made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1461
* maziyarpanahi made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1501
* scottfleming made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1504
* DavidFarago made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1506
* tcapelle made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1483
* emilytin0206 made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1521
* Barbarian7676 made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1518
* FrankRuis made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1548
* abhinand5 made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1404
* ali-mosavian made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1583
* tpoisonooo made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1577
* marijnfs made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1591
* TJ-Solergibert made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1539
* 0-hero made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1553
* deep-diver made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1612
* jquesnelle made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1634
* benredmond made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1640
* lhl made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1635
* thepowerfuldeez made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1646
* jaydeepthik made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1638
* charlesfrye made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1653
* maciejgryka made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1662
* oaishi made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1659
* davidecaroselli made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1655
* abevoelker made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1668
* saeedesmaili made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1678
* josharian made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1684
* Klingefjord made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1685
* akshaylive made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1740
* RodriMora made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1716
* Tostino made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1756
* srib made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1805
* chrislee973 made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1657
* penfever made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1788
* chiwanpark made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1818
* xgal made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1837
* ByronHsu made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1881
* DocShotgun made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1875
* tijmen made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1891
* bxptr made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1941
* Lothiraldan made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1939
* pandora-s-git made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1927
* thomascleberg made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1912
* awhazell made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1961
* afrizalhasbi made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1958
* amazingvince made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1934
* bursteratom made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1974
* MengqingCao made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1909
* OliverKunc made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1997
* shing100 made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/1995

**Full Changelog**: https://github.com/axolotl-ai-cloud/axolotl/compare/v0.4.0...v0.5.0

0.4.0

New Features (highlights)

- Streaming multipack for continued pre-training
- Mistral & Mixtral support
- Simplified Multipack for Mistral, Falcon, Qwen2, and Phi
- DPO/IPO/KTO-pairs RL-training support via trl
- Improve BatchSampler for multipack support, allows for resume from checkpointing, shuffling data each epoch
- bf16: auto support
- add MLFlow support
- save YAML configs to WandB
- save predictions during evals to WandB
- more tests! more smoke tests for smol model training
- NEFTune support

What's Changed

* document that packaging needs to be installed before flash-attn by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/559
* Fix pretraining with iterable/streaming Dataset by jphme in https://github.com/OpenAccess-AI-Collective/axolotl/pull/556
* Add training callback to send predictions to WandB table by Glavin001 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/521
* fix wandb so mypy doesn't complain by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/562
* check for the existence of the default accelerate config that can create headaches by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/561
* add optimization for group-by-len by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/563
* gracefully handle length feature used for group by by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/565
* improve how we setup eval/save strategies and steps by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/547
* let hf trainer handle torch compile by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/516
* Model parallel by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/538
* fix save_steps so it doesn't get duplicated by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/567
* set auto for other params that hf trainer sets for ds. include zero1 json by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/570
* remove columns after tokenizing for pretraining by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/571
* mypy wandb ignore by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/572
* Phi examples by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/569
* e2e testing by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/574
* E2e device cuda by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/575
* E2e passing tests by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/576
* refactor scripts/finetune.py into new cli modules by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/550
* update support matrix with btlm and phi by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/579
* prevent cli functions from getting fired on import by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/581
* Fix Codellama examples by Kimiko-AI in https://github.com/OpenAccess-AI-Collective/axolotl/pull/582
* support custom field for completion from yml by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/580
* Feat(doc): Add features to doc by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/583
* Support Sample packing for phi arch by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/586
* don't resize embeddings if it's already large enough by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/577
* Enable full (non-sharded) model saving with SHARDED_STATE_DICT by jphme in https://github.com/OpenAccess-AI-Collective/axolotl/pull/584
* make phi training work with Loras by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/588
* optionally configure sample packing for evals by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/589
* don't add position_ids for evals when not using eval sample packing by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/591
* gather/broadcast the max value of the packing efficiency automatically by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/463
* Feat(data): Allow loading local csv and text by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/594
* add bf16 check by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/587
* btlm and falcon monkey patches for flash attn by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/566
* minor tweaks to simplify by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/597
* Fix for check with cfg and merge_lora by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/600
* improve handling for empty text on the tokenization step by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/502
* more sane defaults for openllama 3b used for quickstarts by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/602
* update dockerfile to not build evoformer since it fails the build by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/607
* Delete duplicate lines in models.py by bofenghuang in https://github.com/OpenAccess-AI-Collective/axolotl/pull/606
* support to disable exllama for gptq by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/604
* Update requirements.txt - Duplicated package by Psancs05 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/610
* Only run tests when a change to python files is made by maximegmd in https://github.com/OpenAccess-AI-Collective/axolotl/pull/614
* Create multi-node.md by maximegmd in https://github.com/OpenAccess-AI-Collective/axolotl/pull/613
* fix distributed devices by maximegmd in https://github.com/OpenAccess-AI-Collective/axolotl/pull/612
* ignore wandb to resolve isort headaches by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/619
* skip the gpu memory checks if the device is set to 'auto' by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/609
* let MAX_JOBS use the default since we're not resource constrained on our self-hosted runners by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/427
* run eval on the first step to get a baseline by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/617
* split completion text to sequence_len by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/616
* misc fixes to add gptq tests by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/621
* chore(callback): Remove old peft saving code by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/510
* update README w deepspeed info by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/605
* create a model card with axolotl badge by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/624
* better handling and logging of empty sharegpt turns by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/603
* tweak: improve base builder for smaller layers by maximegmd in https://github.com/OpenAccess-AI-Collective/axolotl/pull/500
* Feat(doc): Add eval_sample_packing to doc by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/625
* Fix: Fail bf16 check when running on cpu during merge by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/631
* default model changed by mhenrichsen in https://github.com/OpenAccess-AI-Collective/axolotl/pull/629
* Added quotes to the pip install -e command in the documentation to fix an incompatibility … by Nan-Do in https://github.com/OpenAccess-AI-Collective/axolotl/pull/632
* Feat: Add support for upstream FA2 by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/626
* eval_table isn't quite stable enough to be in default llama configs by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/637
* attention_mask not needed for training by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/642
* update for recent transformers updates by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/636
* use fastchat conversations template by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/578
* skip some flash attn patches unless explicitly enabled by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/643
* Correct typos in datasets.py by felixonmars in https://github.com/OpenAccess-AI-Collective/axolotl/pull/639
* Fix bug in dataset loading by ethanhs in https://github.com/OpenAccess-AI-Collective/axolotl/pull/284
* Warn users to login to HuggingFace by Napuh in https://github.com/OpenAccess-AI-Collective/axolotl/pull/645
* Mistral flash attn packing by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/646
* Fix(cfg): Add validation for save_strategy and eval_strategy by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/633
* Feat: Add example for Mistral by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/644
* Add mistral/README.md by adarshxs in https://github.com/OpenAccess-AI-Collective/axolotl/pull/647
* fix for flash attn w mistral w/o sammple packing by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/648
* don't strip the prompt for check since we don't strip to tokenize anymore by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/650
* add support for defined train split by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/654
* Fix bug when using pretokenized datasets by ein-ich in https://github.com/OpenAccess-AI-Collective/axolotl/pull/652
* Make dataset_processes configurable by corbt in https://github.com/OpenAccess-AI-Collective/axolotl/pull/651
* add mistral e2e tests by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/649
* removed duplicate on requirements.txt by Napuh in https://github.com/OpenAccess-AI-Collective/axolotl/pull/661
* make sure we also run CI tests when requirements.txt changes by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/663
* prepared dataset caching, other misc fixes by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/665
* remove patch fix for phi by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/664
* refactor to set eval_batch_size earlier if unset, so we can warn if mismatched by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/662
* Feat: Add config yaml to section for reprod in bug-report.yaml by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/667
* Feat: Allow usage of native Mistral FA when no sample_packing by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/669
* chore: Clean up repetitive model kwargs by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/670
* Fix(version): Update FA to work with Mistral SWA by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/673
* Fix(tokenizer): Set rstrip,lstrip,norm to False by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/678
* Fix: Future deprecation warning with use_auth_token by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/680
* Feat: Set WORKDIR to /workspace/axolotl by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/679
* Fix: ValueError when FA + Mistral when padding_side=right by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/681
* flash_attention + sample packing for stablelm 3b by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/671
* Adding qlora config for Mistral by TokenBender in https://github.com/OpenAccess-AI-Collective/axolotl/pull/675
* Fix: Higher vram usage for mistral and sample_packing by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/691
* fix multiline for docker by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/694
* update mistral lr, sample pack by mhenrichsen in https://github.com/OpenAccess-AI-Collective/axolotl/pull/693
* apex not needed as amp is part of pytorch by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/696
* add docker images for pytorch 2.10 by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/697
* fix unneeded space by mhenrichsen in https://github.com/OpenAccess-AI-Collective/axolotl/pull/699
* Update README with some explanations by seungduk-yanolja in https://github.com/OpenAccess-AI-Collective/axolotl/pull/700
* Get qlora mistral-7b fine tuning working on a single 4090 by lukemarsden in https://github.com/OpenAccess-AI-Collective/axolotl/pull/708
* fix(doc): Add note on inference w sample packing by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/712
* Fix: lowercase `True` values in config by atgctg in https://github.com/OpenAccess-AI-Collective/axolotl/pull/713
* fix(doc): update default doc according to arg by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/714
* Save Axolotl config as WandB artifact by jphme in https://github.com/OpenAccess-AI-Collective/axolotl/pull/716
* improve handling of the prepared ds path and other cfg defaults by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/701
* fix pytorch 2.1.0 build, add multipack docs by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/722
* add noisy embedding by maximegmd in https://github.com/OpenAccess-AI-Collective/axolotl/pull/721
* pin xformers >= 0.0.22 by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/724
* misc sharegpt fixes by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/723
* workaround for installing xformers w torch 2.1.0 by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/725
* tweak for xformers install w pytorch 2.1.0 by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/727
* fixes for alpaca w chatml, and don't include attention_mask w mistral for flash attention by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/728
* Clarify custom format example by casper-hansen in https://github.com/OpenAccess-AI-Collective/axolotl/pull/729
* Mistral: Sliding Window Attention with Flash Attention and Sample Packing by casper-hansen in https://github.com/OpenAccess-AI-Collective/axolotl/pull/732
* badge by mhenrichsen in https://github.com/OpenAccess-AI-Collective/axolotl/pull/739
* catch ConnectionError when checking dataset from HuggingFace by Napuh in https://github.com/OpenAccess-AI-Collective/axolotl/pull/743
* Fix(model): Linear detected and added to target module with rope linear by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/738
* improve: Enhance code readability of prompt_tokenizers.py by seungduk-yanolja in https://github.com/OpenAccess-AI-Collective/axolotl/pull/707
* add a latest tag for regular axolotl image, cleanup extraneous print statement by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/746
* Fix DeepSpeed Zero 3 Saving by tokestermw in https://github.com/OpenAccess-AI-Collective/axolotl/pull/709
* chore: bump transformers to v4.34.1 to fix tokenizer issue by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/745
* add to docs by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/703
* Implement fused modules by casper-hansen in https://github.com/OpenAccess-AI-Collective/axolotl/pull/747
* remove lora fused packing test by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/758
* Fix: eval table conflict with eval_sample_packing by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/769
* Fix: Cannot tokenize with bf16 and on cpu by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/766
* Hotfix for fused QKV not saving the trained weights of o_proj by casper-hansen in https://github.com/OpenAccess-AI-Collective/axolotl/pull/762
* convert exponential notation lr to floats by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/771
* Fix: Warn when fullfinetune without adapter by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/770
* simplify by removing duplicate base_model_config by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/772
* disable eval table w sample packing in examples by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/778
* refactor setup trainer so we can add more hooks by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/773
* chore: refactor truthy check and fix mypy by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/780
* chore(readme): Improve documentation on conversation field by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/782
* Threaded MultipackDistributedDataloader with prefetched samples by casper-hansen in https://github.com/OpenAccess-AI-Collective/axolotl/pull/759
* Create preprocess CLI by casper-hansen in https://github.com/OpenAccess-AI-Collective/axolotl/pull/785
* Add docker advanced instruction to README by gordicaleksa in https://github.com/OpenAccess-AI-Collective/axolotl/pull/792
* Fix Deepspeed Zero3 Config by teknium1 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/791
* Update to adapt to sharegpt datasets with "assistant" rather than "gp… by MilesQLi in https://github.com/OpenAccess-AI-Collective/axolotl/pull/774
* fix eval_steps to be a sane default by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/797
* refactor neft patch to be more re-usable similar to trl's impl by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/796
* fix(config): Set eos/bos to tokenizer if different by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/801
* feat(doc): add dummyoptim faq fix by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/802
* fix(tokenizer): update log order after update by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/806
* fix model parallel by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/816
* fix: pin autogptq by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/818
* update table for rwkv4 support, fix process count for dataset by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/822
* Feat: Added Gradio support by Stillerman in https://github.com/OpenAccess-AI-Collective/axolotl/pull/812
* Dockerfile: add deepspeed-kernels dependency for deepspeed>=0.12.0 by fpreiss in https://github.com/OpenAccess-AI-Collective/axolotl/pull/827
* cleanup verbosity a bit by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/799
* make sure to cleanup tmp output_dir for e2e tests by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/831
* multipack w batch sampler by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/795
* don't compile deepspeed or bitsandbytes from source by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/837
* Pin optimum package by brthor in https://github.com/OpenAccess-AI-Collective/axolotl/pull/838
* cleanup the old multipack dataloader by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/841
* include the suffix modified string in ascii art by fpreiss in https://github.com/OpenAccess-AI-Collective/axolotl/pull/852
* feat(doc): add more info on train_on_split by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/855
* chore(doc): Separate section on runpod by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/860
* various bugfixes by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/856
* adds llama and mistral dropout support by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/858
* multipack len should use max, not min by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/863
* Docs: add instructions to 1-click launching on public clouds by concretevitamin in https://github.com/OpenAccess-AI-Collective/axolotl/pull/862
* Update data.py for signature generation by MilesQLi in https://github.com/OpenAccess-AI-Collective/axolotl/pull/851
* lint fix that didn't get caught by linter by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/866
* make docker command more robust by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/861
* add e2e tests for checking functionality of resume from checkpoint by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/865
* allow overriding of model_config parameters from the YML by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/853
* Feat: Add dataset loading from S3, GCS by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/765
* try 2: pin hf transformers and accelerate to latest release, don't reinstall pytorch by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/867
* don't train if eval split is too small by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/873
* Phi update 202311 by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/876
* Install from git url by msaroufim in https://github.com/OpenAccess-AI-Collective/axolotl/pull/874
* fix: revert local dir dataset load by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/878
* chore(doc): Add info on changing role in sharegpt by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/886
* Feat: Add warmup_ratio by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/893
* fix: warning should not show if eval_batch_size not provided by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/896
* Feat: Add Qwen by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/894
* update datasets version to cut down the warnings due to pyarrow arg change by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/897
* fix: remove FA for qwen examples by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/900
* Determine FSDP/deepspeed settings on device select. by kallewoof in https://github.com/OpenAccess-AI-Collective/axolotl/pull/883
* ensure merged model matches the training dtype by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/902
* fix for qwen w lora by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/906
* Remove lr scheduler in DeepSpeed config to avoid conflict by Haoxiang-Wang in https://github.com/OpenAccess-AI-Collective/axolotl/pull/909
* feature: loss watchdog for terminating training runs that are failing by kallewoof in https://github.com/OpenAccess-AI-Collective/axolotl/pull/899
* Feat(wandb): Refactor to be more flexible by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/767
* Support device_map=sequential & max_memory config parameters by brthor in https://github.com/OpenAccess-AI-Collective/axolotl/pull/903
* feat: add check for quantized model by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/913
* Pin flash-attn to 2.3.3 by casper-hansen in https://github.com/OpenAccess-AI-Collective/axolotl/pull/919
* fix(tokenizer): handle fast tokenizer properly for bos/eos by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/914
* support for mamba by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/915
* fixing prompt template of chatml by removal of linebreak by timothylimyl in https://github.com/OpenAccess-AI-Collective/axolotl/pull/922
* Mixtral multipack by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/928
* update to latest transformers for mixstral support by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/929
* Mixtral: More correct MoE, lower loss by casper-hansen in https://github.com/OpenAccess-AI-Collective/axolotl/pull/932
* Update requirements.txt (fschat==0.2.34) by tokestermw in https://github.com/OpenAccess-AI-Collective/axolotl/pull/940
* Mixtral official by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/942
* Respect sequence_len in config for `type: llama2_chat` by hamelsmu in https://github.com/OpenAccess-AI-Collective/axolotl/pull/926
* new evals_per_epoch and saves_per_epoch to make things cleaner by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/944
* More hints on what to do with CUDA Out of memory errors by jooray in https://github.com/OpenAccess-AI-Collective/axolotl/pull/925
* fix: remove excessive newlines in system prompt(s) for alpaca by kallewoof in https://github.com/OpenAccess-AI-Collective/axolotl/pull/936
* Flash attn hotfix by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/951
* Fix Deepspeed loading by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/950
* fix: switch to using the HuggingFace Transformers NEFT implementation by kallewoof in https://github.com/OpenAccess-AI-Collective/axolotl/pull/941
* Add docs by hamelsmu in https://github.com/OpenAccess-AI-Collective/axolotl/pull/947
* Fix prompt assembly for llama by hamelsmu in https://github.com/OpenAccess-AI-Collective/axolotl/pull/952
* update transformers to fix checkpoint saving by dumpmemory in https://github.com/OpenAccess-AI-Collective/axolotl/pull/963
* update to latest nccl in docker image by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/965
* fix for build for nccl in dockerfile by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/970
* fix: add lr scheduler kwargs to Trainer by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/972
* Update README.md by eltociear in https://github.com/OpenAccess-AI-Collective/axolotl/pull/966
* Dockerfile torch fix by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/987
* fix mistral prompt assembly by hamelsmu in https://github.com/OpenAccess-AI-Collective/axolotl/pull/982
* Feat: Warns to add to modules_to_save when adding tokens or switching special_tokens by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/787
* Add tests to Docker by hamelsmu in https://github.com/OpenAccess-AI-Collective/axolotl/pull/993
* change val size by mhenrichsen in https://github.com/OpenAccess-AI-Collective/axolotl/pull/992
* chore: Update transformers to latest by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/986
* support for cuda 12.1 by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/989
* set output_router_logits for mixtral config: by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/995
* Add an example config for finetuning a 34B model on a 24GB GPU by evangriffiths in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1000
* FEAT: add tagging support to axolotl by younesbelkada in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1004
* Set eval_sample_packing to false in mistral config.yaml by kmsydney in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1003
* add config to model card by hamelsmu in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1005
* remove landmark attn and xpos rope implementations by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1010
* [Docs] Nit: clarify what inference is by hamelsmu in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1012
* [Docs] Nit: Remind people to auth to wandb if they are going to use it by hamelsmu in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1013
* feat: remove need to add load_in* during merge by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1017
* feat: expose bnb kwargs by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1018
* add ultrachat prompt strategies by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/996
* [WandB] Push axolotl config to top level wandb files by hamelsmu in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1014
* Adds chat templates by mhenrichsen in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1022
* Fix: bf16 support for inference by taziksh in https://github.com/OpenAccess-AI-Collective/axolotl/pull/981
* use recommended setting for use_reentrant w gradient checkpointing by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1021
* added tiny llama examples for lora and qlora by tdolan21 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1027
* chore(readme): update instruction to set config to load from cache by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1030
* [Docs] delete unused cfg value `lora_out_dir` by hamelsmu in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1029
* fix: lint by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1037
* chore(config): clean up old log for Qwen by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1034
* bump transformers and update attention class map name by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1023
* Added chatglm3 conversation type for training models like TinyLLama by xaviviro in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1036
* fix HF model card upload for PEFT models by hamelsmu in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1043
* Clean Up LorA Merge by hamelsmu in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1044
* feature: better device mapping for large models by kallewoof in https://github.com/OpenAccess-AI-Collective/axolotl/pull/918
* feat: always push checkpoint to hub if set by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1049
* Update tests-docker.yml by hamelsmu in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1052
* streaming multipack for pretraining dataset by jinwonkim93 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/959
* Simplify Docker Unit Test CI by hamelsmu in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1055
* Phi2 rewrite by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1058
* Efficiently get the length of the tokenized docs by RicardoDominguez in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1063
* Sponsors by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1065
* Update FUNDING.yml for Kofi link by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1067
* fix: torch_dtype mistral default to fp32 by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1050
* Cosine learning rate schedule - minimum learning rate by RicardoDominguez in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1062
* fix double eos token for chatml by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1054
* Add: mlflow for experiment tracking by JohanWork in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1059
* update peft to 0.7.0 by mtenenholtz in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1073
* paired kto support by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1069
* Separate AutoGPTQ dep to `pip install -e .[auto-gptq]` by casper-hansen in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1077
* attempt to also run e2e tests that needs gpus by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1070
* Update FUNDING.yml with bitcoin by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1079
* swap the data collator for evals if not using sample packing by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1076
* be more robust about checking embedding modules for lora finetunes by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1074
* fix: `train_on_inputs: true` ignored for sharegpt by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1045
* update sharegpt conversations when chatml chat template is set by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1075
* additional logging to get maximum token length of a sequence in the dataset by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1066
* pin accelerate for deepspeed fix by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1080
* fix: warn user to install mamba_ssm package by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1019
* use tags again for test image, only run docker e2e after pre-commit checks by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1081
* optimize calculation of cu_seqlens from position_ids by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1084
* add python 3.11 to the matrix for unit tests by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1085
* Remove fused-dense-lib from requirements.txt by casper-hansen in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1087
* misc fixes from 943 by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1086
* add gptneox embeddings, fix phi2 inputs, also fix the casting by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1083
* Add Debugging Guide by hamelsmu in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1089
* Fix debugging.md by hamelsmu in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1091
* feat: enable trl's autounwrap by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1060
* Fix broken pypi.yml by msaroufim in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1099
* Update README.md by hamelsmu in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1103
* Add section for debugging with Docker by hamelsmu in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1104
* Add link on README to Docker Debugging by hamelsmu in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1107
* keep gate in fp32 for loras by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1105
* Fix debugging video by hamelsmu in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1111
* Disable caching on `--disable_caching` in CLI by casper-hansen in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1110
* Reverse caching PR by casper-hansen in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1115
* Enable or disable bf16 support based on availability by simhallq in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1116
* update PR template so we can capture twitter or discord handles by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1121
* pin model_revision for phi2 by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1123
* fix(readme): clarify custom user prompt [no-ci] by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1124
* Add `layers_to_transform` for `lora_config` by xzuyn in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1118
* Agnostic cloud gpu docker image and Jupyter lab by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1097
* Preprocess dataset size fix by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1131
* fix(preprocess): Make sure dataset not loaded from cache when using preprocess cli by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1136
* fix bf16 check when preprocessing data by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1140
* Add shifted sparse attention by joecummings in https://github.com/OpenAccess-AI-Collective/axolotl/pull/973
* Multipack simplify for Mixtral by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1142
* Fix link for Minotaur model by joecummings in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1146
* Dockerfile cloud ports by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1148
* fix check for env var by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1151
* feat(dataset): add config to keep processed dataset in memory by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1152
* Deprecate max packed sequence len by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1141
* make sure the model config loader respects the model_revision too by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1160
* Qwen2 by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1166
* jupyter lab fixes by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1139
* set fp16 to false if bf16, update bf16: auto in example YAMLs by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1122
* Add mlflow callback for pushing config to mlflow artifacts by JohanWork in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1125
* improve vram use w gradient checkpointing by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1167
* Vram fix attempt by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1164
* add commit message option to skip docker image builds in ci by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1168
* Falcon embeddings by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1149
* support for explicit test_dataset definition for evals by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/786
* Add desc to map/filter by casper-hansen in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1162
* Feat(test): Add tests for alpaca chatml prompt tokenizer by JohanWork in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1088
* DPO cleanup by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1126
* Update README.md by singhay in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1169
* Fine-Tuning Mistral-7b for Real-World Chatbot Applications Using Axolotl (Lora used) by Tilemachoc in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1155
* don't fail if can't cast weights due to offload when merging by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1172
* update docs by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1176
* Phi2 multipack by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1173
* DPO fixes v2 by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1174
* Docs: RLHF Update after cleanup by AlekseyKorshuk in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1178
* Add support for offline mode with HF_HUB_OFFLINE envvar by JamesHWade in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1182
* Fix do_merge_lora raises an Exception in transformers v4.37.0 by tisorlawan in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1184
* report min lenght of tokenized data by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1186
* more dpo fixes for dataset loading and docs by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1185
* upgrade deepspeed to 0.13.1 for mixtral fixes by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1189
* Standardize system prompt format for AlpacaPrompter (instruct case) by sadaisystems in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1190
* Mixtral fixes 20240124 by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1192
* prepare for release v0.4.0 by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1175

New Contributors
* Kimiko-AI made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/582
* bofenghuang made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/606
* Psancs05 made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/610
* Nan-Do made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/632
* felixonmars made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/639
* Napuh made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/645
* adarshxs made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/647
* ein-ich made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/652
* corbt made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/651
* TokenBender made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/675
* seungduk-yanolja made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/700
* lukemarsden made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/708
* atgctg made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/713
* casper-hansen made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/729
* tokestermw made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/709
* gordicaleksa made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/792
* MilesQLi made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/774
* Stillerman made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/812
* fpreiss made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/827
* brthor made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/838
* concretevitamin made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/862
* msaroufim made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/874
* kallewoof made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/883
* Haoxiang-Wang made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/909
* timothylimyl made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/922
* hamelsmu made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/926
* jooray made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/925
* dumpmemory made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/963
* eltociear made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/966
* evangriffiths made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1000
* younesbelkada made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1004
* kmsydney made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1003
* taziksh made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/981
* tdolan21 made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1027
* xaviviro made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1036
* jinwonkim93 made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/959
* RicardoDominguez made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1063
* JohanWork made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1059
* mtenenholtz made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1073
* simhallq made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1116
* xzuyn made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1118
* joecummings made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/973
* singhay made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1169
* Tilemachoc made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1155
* AlekseyKorshuk made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1178
* JamesHWade made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1182
* tisorlawan made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1184
* sadaisystems made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1190

**Full Changelog**: https://github.com/OpenAccess-AI-Collective/axolotl/compare/v0.3.0...v0.4.0

Page 1 of 2

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.