Axolotl

Latest version: v0.8.0

Safety actively analyzes 723650 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 2

0.8.0

New Features

Sequence parallelism support via ring-flash-attn
This enables long context training by distributing sequences across GPUs, reducing memory requirements per device while allowing near-linear scaling in context length per GPU. This complements other parallelism features that Axolotl offers, including FSDP and DeepSpeed. See our documentation [here](https://axolotl-ai-cloud.github.io/axolotl/docs/sequence_parallelism.html).
<img width="763" alt="Screenshot 2025-04-02 at 9 17 14 AM" src="https://github.com/user-attachments/assets/308db66d-084e-45b1-87c3-1a7b405390bc" />

Gemma-3 support has landed alongside several features to help you fine-tune Gemma-3 models:
- Cut cross entropy
- Liger kernel
- Multimodal
- Fixed loss calculation for Gradient Accumulation

Multimodal
- Beta support for a variety of multi-modal models:
- Mllama
- Pixtral
-Llava-1.5
- Mistral-Small-3.1
- Gemma-3
- Qwen2-VL
- Qwen2.5-VL

Additional Features
- Updated cut-cross-entropy patches for several models: Cohere, Cohere-2, Gemma, Gemma-2, Gemma-3, Mistral-3, and Mllama
- Support for the REX Learning Rate Scheduler - https://arxiv.org/abs/2107.04197
- Tokenizer Overrides - you can now fine-tune with custom values in tokenizers using reserved tokens
- Single-gpu and DDP support for Muon Optimizer
- Sequential packing for Curriculum learning
- Speeding up GRPO training with distributed vLLM - you can now use `axolotl vllm-serve path/to/config.yaml` to serve a separate vLLM instance which can utilize multiple GPUs to speed up trajectory generation during GRPO.

Notes

v0.8.x will be the last set of releases that will officially support torch<=2.4.1. With PyTorch 2.7 release this month, we aim to support the latest 2 stable releases of PyTorch.
We expect FSDP2 support to be a fast follow and we'll include that in v0.8.1 once we can fix and validate issues such as saving checkpoints.

What's Changed
* `train.py` refactor by djsaunde in https://github.com/axolotl-ai-cloud/axolotl/pull/2371
* fix(doc): add installation for cce to docs by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2375
* chore(docs): remove phorm by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2378
* feat(doc): add docker images explanation by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2379
* feat(doc): document drop_system_message and clarify limitation by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2381
* chore(doc): add clarification about mpi4py error on single gpu deepspeed by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2383
* fix(doc): add missing low_cpu_mem_usage config to docs by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2369
* feat(grpo): add reward_weights config and refactor by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2365
* Add REX LR Scheduler by xzuyn in https://github.com/axolotl-ai-cloud/axolotl/pull/2380
* Update Tokenizer Overrides Handling in models.py by mhenrichsen in https://github.com/axolotl-ai-cloud/axolotl/pull/1549
* various fixes 20250305 by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2384
* Optimizer refactor and add Muon support by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2367
* remove lion-pytorch as it's already handled upstream by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2389
* refactor: trl grpo configs to have descriptions by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2386
* feat(doc): add more info on RewardModel datasets by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2391
* chore(doc): add faq when having no default chat_template by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2398
* Use Latest Cut Cross Entropy by xzuyn in https://github.com/axolotl-ai-cloud/axolotl/pull/2392
* fix: create mount folder on modal if not exist by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2390
* include iproute2 and nvtop in cloud image by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2393
* fix(modal): add git pull when getting branch files by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2399
* pass additional info for fix untrained tokens when using distributed + offloading by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2388
* use max of 32 dataset processes if not explicit by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2403
* build cloud images with torch 2.6.0 by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2413
* only validate hf user token on rank 0 by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2408
* fixes against upstream main branches by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2407
* chore(docs): add cookbook/blog link to docs by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2410
* Feat: minor docs improvements for RLHF and faq on embeddings by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2401
* Update README.md by SicariusSicariiStuff in https://github.com/axolotl-ai-cloud/axolotl/pull/2360
* use default torch fused adamw optimizer as default as adamw_hf is deprecated by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2425
* bump HF versions except for trl by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2427
* add 12.8.1 cuda to the base matrix by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2426
* add run on novita ai by liyiligang in https://github.com/axolotl-ai-cloud/axolotl/pull/2421
* chore(doc): add instructions on adding custom integrations by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2422
* Fixing KTO+QLoRA+multi-GPU by SalmanMohammadi in https://github.com/axolotl-ai-cloud/axolotl/pull/2420
* adding pre-commit auto-update GH action and bumping plugin versions by djsaunde in https://github.com/axolotl-ai-cloud/axolotl/pull/2428
* chore(doc): add explanation on fsdp_transformer_layer_cls_to_wrap by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2429
* Autodoc generation with quartodoc by djsaunde in https://github.com/axolotl-ai-cloud/axolotl/pull/2419
* Sequence parallelism by djsaunde in https://github.com/axolotl-ai-cloud/axolotl/pull/2412
* installing axolotl prior to quartodoc build by djsaunde in https://github.com/axolotl-ai-cloud/axolotl/pull/2434
* Fix failing test by djsaunde in https://github.com/axolotl-ai-cloud/axolotl/pull/2436
* Feat: Add support for gemma3_text and add e2e for gemma2 by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2406
* Feat: Rework multimodal support (mllama, llava, pixtral, qwen2, qwen25, gemma3, mistral3) by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2435
* feat: add CCE for gemma3, cohere, and cohere2 by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2443
* chore: minor optim changes (add apollo, improve docs, remove lion-pytorch) by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2444
* fix(doc): document`do_causal_lm_eval` required to run `eval_causal_lm_metrics` by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2445
* Set the pytorch_cuda_alloc_conf env in the train module by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2447
* add override of upstream fix for multi-gpu orpo by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2440
* hf offline decorator for tests to workaround rate limits by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2452
* bump liger to 0.5.5 by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2448
* use offline for precached stream dataset by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2453
* fix streaming packing test by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2454
* fix: minor patches for multimodal by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2441
* Sequence parallelism quick follow-ups; remove ModelCallback by djsaunde in https://github.com/axolotl-ai-cloud/axolotl/pull/2450
* destroy process group on Ctrl+C / training or eval run by djsaunde in https://github.com/axolotl-ai-cloud/axolotl/pull/2457
* Ray train bugfix by djsaunde in https://github.com/axolotl-ai-cloud/axolotl/pull/2458
* Updates for trl 0.16.0 - mostly for GRPO by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2437
* Fix(doc): Clarify doc on attention configs and missing pad_token by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2455
* Sequential sample packing by DreamGenX in https://github.com/axolotl-ai-cloud/axolotl/pull/2404
* gemma3 packing fixes by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2449
* Release update 20250331 by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2460
* Fix(doc): Minor doc changes for peft and modal by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2462
* Fix: remove the numerous sequential log by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2461
* Validation for Muon optimizer with DS/FSDP by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2464
* fixing eval for SP by djsaunde in https://github.com/axolotl-ai-cloud/axolotl/pull/2468
* fix: downgrade deepspeed to fix grad checkpoint oom by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2465
* fix: set rl=None during inference by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2463
* torch 2.7.0 base image for testing by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2467
* fix: pydantic warning validator not returning self by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2474
* feat: add support for multimodal in lora kernels by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2472
* fix: gemma3 loss in forward pass by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2473
* fix: disable SP during merge by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2470
* fix: separate gemma3 text and vision example config by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2471
* fix(doc): document offload gradient_checkpointing option by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2475
* set release version 0.8.0 by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2476

New Contributors
* SicariusSicariiStuff made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/2360
* liyiligang made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/2421

**Full Changelog**: https://github.com/axolotl-ai-cloud/axolotl/compare/v0.7.1...v0.8.0

0.7.1

What's Changed
* bump dev version by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2342
* Doc fix: TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL not necessary to use Triton kernel patches by djsaunde in https://github.com/axolotl-ai-cloud/axolotl/pull/2343
* make sure chatml dpo dataset loading works by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2333
* Fix sample packing producing longer sequences than specified by `sequence_len` by tobmi1 in https://github.com/axolotl-ai-cloud/axolotl/pull/2332
* quick formatting fix for LoRA optims doc by djsaunde in https://github.com/axolotl-ai-cloud/axolotl/pull/2349
* calculate sample length fixes and SFT splitting fixes by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2351
* feat: update transformers version to 4.49.0 by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2340
* Bumping 0.15.1 TRL version for GRPO+PEFT fix by SalmanMohammadi in https://github.com/axolotl-ai-cloud/axolotl/pull/2344
* support for passing init_lora_weights to lora_config by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2352
* fix(doc): add missing auto_find_batch_size by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2339
* don't install extraneous old version of pydantic in ci and make sre to run multigpu ci by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2355
* Relicense the logprob KD loss functions as Apache 2.0 by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2358
* Correctly reference mount paths by reissbaker in https://github.com/axolotl-ai-cloud/axolotl/pull/2347
* bump liger to 0.5.3 by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2353
* feat: add deepseek_v3 sample packing by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2230
* Feat(doc): Reorganize documentation, fix broken syntax, update notes by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2348
* Fix(doc): address missing doc changes by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2362

New Contributors
* tobmi1 made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/2332
* reissbaker made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/2347

**Full Changelog**: https://github.com/axolotl-ai-cloud/axolotl/compare/v0.7.0...v0.7.1

0.7.0

New Contributors
* NJordan72 made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/2219
* SalmanMohammadi made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/2231
* v-dicicco made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/2235
* jwongTensora made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/2257
* adi-kmt made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/2268
* mashdragon made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/2281
* erictang000 made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/2251
* leeparkuky made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/2193
* minpeter made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/2322

**Full Changelog**: https://github.com/axolotl-ai-cloud/axolotl/compare/v0.6.0...v0.7.0

0.6.0

What's Changed

0.5.2

What's Changed
* move deprecated kwargs from trainer to trainingargs by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2028
* add axolotlai docker hub org to publish list by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2031
* update actions version for node16 deprecation by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2037
* replace references to personal docker hub to org docker hub by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2036
* feat: add metharme chat_template by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2033
* change deprecated Stub to App by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2038
* fix: handle sharegpt dataset missing by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2035
* add P2P env when multi-gpu but not the full node by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2041
* invert the string in string check for p2p device check by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2044
* feat: print out dataset length even if not preprocess by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2034
* Add example YAML file for training Mistral using DPO by olivermolenschot in https://github.com/axolotl-ai-cloud/axolotl/pull/2029
* fix: inference not using chat_template by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2019
* feat: cancel ongoing tests if new CI is triggered by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2046
* feat: upgrade to liger 0.4.1 by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2045
* run pypi release action on tag create w version by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2047
* make sure to tag images in docker for tagged releases by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2051
* retry flaky test_packing_stream_dataset test that timesout on read by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2052
* install default torch version if not already, new xformers wheels for torch 2.5.x by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2049
* fix push to main and tag semver build for docker ci by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2054
* Update unsloth for torch.cuda.amp deprecation by bursteratom in https://github.com/axolotl-ai-cloud/axolotl/pull/2042
* don't cancel the tests on main automatically for concurrency by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2055
* ADOPT optimizer integration by bursteratom in https://github.com/axolotl-ai-cloud/axolotl/pull/2032
* Grokfast support by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1917
* upgrade to flash-attn 2.7.0 by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2048
* make sure to add tags for versioned tag on cloud docker images by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2060
* fix duplicate base build by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2061
* fix env var extraction by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2043
* gradient accumulation tests, embeddings w pad_token fix, smaller models by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2059
* upgrade datasets==3.1.0 and add upstream check by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2067
* update to be deprecated evaluation_strategy by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1682
* remove the bos token from dpo outputs by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1733
* support passing trust_remote_code to dataset loading by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2050
* support for schedule free and e2e ci smoke test by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2066
* Fsdp grad accum monkeypatch by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2064
* fix: loading locally downloaded dataset by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2056
* Update `get_unpad_data` patching for multipack by chiragjn in https://github.com/axolotl-ai-cloud/axolotl/pull/2013
* increase worker count to 8 for basic pytests by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2075
* upgrade autoawq==0.2.7.post2 for transformers fix by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2070
* optim e2e tests to run a bit faster by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2069
* don't build bdist by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2076
* static assets, readme, and badges update v1 by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2077
* Readme updates v2 by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2078
* bump transformers for fsdp-grad-accum fix, remove patch by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2079
* Feat: Drop long samples and shuffle rl samples by NanoCode012 in https://github.com/axolotl-ai-cloud/axolotl/pull/2040
* add optimizer step to prevent warning in tests by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/1502
* fix brackets on docker ci builds, add option to skip e2e builds by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2080
* remove deprecated extra metadata kwarg from pydantic Field by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2081

0.5.1

* make sure action has permission to create release by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2083
* set manifest and fix for source dist by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2084
* add missing dunder-init for monkeypatches and add tests for install from sdist by winglian in https://github.com/axolotl-ai-cloud/axolotl/pull/2085

New Contributors
* olivermolenschot made their first contribution in https://github.com/axolotl-ai-cloud/axolotl/pull/2029

**Full Changelog**: https://github.com/axolotl-ai-cloud/axolotl/compare/v0.5.0...v0.5.2

Page 1 of 2

Releases

Has known vulnerabilities

Axolotl

Page 1 of 2

0.8.0

0.7.1

0.7.0

0.6.0

0.5.2

0.5.1

Page 1 of 2

Links

Releases