New Features (highlights)
- Streaming multipack for continued pre-training
- Mistral & Mixtral support
- Simplified Multipack for Mistral, Falcon, Qwen2, and Phi
- DPO/IPO/KTO-pairs RL-training support via trl
- Improve BatchSampler for multipack support, allows for resume from checkpointing, shuffling data each epoch
- bf16: auto support
- add MLFlow support
- save YAML configs to WandB
- save predictions during evals to WandB
- more tests! more smoke tests for smol model training
- NEFTune support
What's Changed
* document that packaging needs to be installed before flash-attn by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/559
* Fix pretraining with iterable/streaming Dataset by jphme in https://github.com/OpenAccess-AI-Collective/axolotl/pull/556
* Add training callback to send predictions to WandB table by Glavin001 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/521
* fix wandb so mypy doesn't complain by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/562
* check for the existence of the default accelerate config that can create headaches by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/561
* add optimization for group-by-len by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/563
* gracefully handle length feature used for group by by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/565
* improve how we setup eval/save strategies and steps by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/547
* let hf trainer handle torch compile by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/516
* Model parallel by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/538
* fix save_steps so it doesn't get duplicated by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/567
* set auto for other params that hf trainer sets for ds. include zero1 json by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/570
* remove columns after tokenizing for pretraining by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/571
* mypy wandb ignore by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/572
* Phi examples by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/569
* e2e testing by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/574
* E2e device cuda by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/575
* E2e passing tests by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/576
* refactor scripts/finetune.py into new cli modules by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/550
* update support matrix with btlm and phi by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/579
* prevent cli functions from getting fired on import by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/581
* Fix Codellama examples by Kimiko-AI in https://github.com/OpenAccess-AI-Collective/axolotl/pull/582
* support custom field for completion from yml by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/580
* Feat(doc): Add features to doc by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/583
* Support Sample packing for phi arch by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/586
* don't resize embeddings if it's already large enough by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/577
* Enable full (non-sharded) model saving with SHARDED_STATE_DICT by jphme in https://github.com/OpenAccess-AI-Collective/axolotl/pull/584
* make phi training work with Loras by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/588
* optionally configure sample packing for evals by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/589
* don't add position_ids for evals when not using eval sample packing by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/591
* gather/broadcast the max value of the packing efficiency automatically by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/463
* Feat(data): Allow loading local csv and text by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/594
* add bf16 check by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/587
* btlm and falcon monkey patches for flash attn by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/566
* minor tweaks to simplify by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/597
* Fix for check with cfg and merge_lora by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/600
* improve handling for empty text on the tokenization step by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/502
* more sane defaults for openllama 3b used for quickstarts by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/602
* update dockerfile to not build evoformer since it fails the build by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/607
* Delete duplicate lines in models.py by bofenghuang in https://github.com/OpenAccess-AI-Collective/axolotl/pull/606
* support to disable exllama for gptq by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/604
* Update requirements.txt - Duplicated package by Psancs05 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/610
* Only run tests when a change to python files is made by maximegmd in https://github.com/OpenAccess-AI-Collective/axolotl/pull/614
* Create multi-node.md by maximegmd in https://github.com/OpenAccess-AI-Collective/axolotl/pull/613
* fix distributed devices by maximegmd in https://github.com/OpenAccess-AI-Collective/axolotl/pull/612
* ignore wandb to resolve isort headaches by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/619
* skip the gpu memory checks if the device is set to 'auto' by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/609
* let MAX_JOBS use the default since we're not resource constrained on our self-hosted runners by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/427
* run eval on the first step to get a baseline by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/617
* split completion text to sequence_len by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/616
* misc fixes to add gptq tests by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/621
* chore(callback): Remove old peft saving code by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/510
* update README w deepspeed info by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/605
* create a model card with axolotl badge by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/624
* better handling and logging of empty sharegpt turns by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/603
* tweak: improve base builder for smaller layers by maximegmd in https://github.com/OpenAccess-AI-Collective/axolotl/pull/500
* Feat(doc): Add eval_sample_packing to doc by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/625
* Fix: Fail bf16 check when running on cpu during merge by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/631
* default model changed by mhenrichsen in https://github.com/OpenAccess-AI-Collective/axolotl/pull/629
* Added quotes to the pip install -e command in the documentation to fix an incompatibility … by Nan-Do in https://github.com/OpenAccess-AI-Collective/axolotl/pull/632
* Feat: Add support for upstream FA2 by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/626
* eval_table isn't quite stable enough to be in default llama configs by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/637
* attention_mask not needed for training by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/642
* update for recent transformers updates by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/636
* use fastchat conversations template by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/578
* skip some flash attn patches unless explicitly enabled by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/643
* Correct typos in datasets.py by felixonmars in https://github.com/OpenAccess-AI-Collective/axolotl/pull/639
* Fix bug in dataset loading by ethanhs in https://github.com/OpenAccess-AI-Collective/axolotl/pull/284
* Warn users to login to HuggingFace by Napuh in https://github.com/OpenAccess-AI-Collective/axolotl/pull/645
* Mistral flash attn packing by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/646
* Fix(cfg): Add validation for save_strategy and eval_strategy by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/633
* Feat: Add example for Mistral by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/644
* Add mistral/README.md by adarshxs in https://github.com/OpenAccess-AI-Collective/axolotl/pull/647
* fix for flash attn w mistral w/o sammple packing by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/648
* don't strip the prompt for check since we don't strip to tokenize anymore by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/650
* add support for defined train split by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/654
* Fix bug when using pretokenized datasets by ein-ich in https://github.com/OpenAccess-AI-Collective/axolotl/pull/652
* Make dataset_processes configurable by corbt in https://github.com/OpenAccess-AI-Collective/axolotl/pull/651
* add mistral e2e tests by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/649
* removed duplicate on requirements.txt by Napuh in https://github.com/OpenAccess-AI-Collective/axolotl/pull/661
* make sure we also run CI tests when requirements.txt changes by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/663
* prepared dataset caching, other misc fixes by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/665
* remove patch fix for phi by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/664
* refactor to set eval_batch_size earlier if unset, so we can warn if mismatched by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/662
* Feat: Add config yaml to section for reprod in bug-report.yaml by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/667
* Feat: Allow usage of native Mistral FA when no sample_packing by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/669
* chore: Clean up repetitive model kwargs by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/670
* Fix(version): Update FA to work with Mistral SWA by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/673
* Fix(tokenizer): Set rstrip,lstrip,norm to False by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/678
* Fix: Future deprecation warning with use_auth_token by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/680
* Feat: Set WORKDIR to /workspace/axolotl by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/679
* Fix: ValueError when FA + Mistral when padding_side=right by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/681
* flash_attention + sample packing for stablelm 3b by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/671
* Adding qlora config for Mistral by TokenBender in https://github.com/OpenAccess-AI-Collective/axolotl/pull/675
* Fix: Higher vram usage for mistral and sample_packing by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/691
* fix multiline for docker by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/694
* update mistral lr, sample pack by mhenrichsen in https://github.com/OpenAccess-AI-Collective/axolotl/pull/693
* apex not needed as amp is part of pytorch by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/696
* add docker images for pytorch 2.10 by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/697
* fix unneeded space by mhenrichsen in https://github.com/OpenAccess-AI-Collective/axolotl/pull/699
* Update README with some explanations by seungduk-yanolja in https://github.com/OpenAccess-AI-Collective/axolotl/pull/700
* Get qlora mistral-7b fine tuning working on a single 4090 by lukemarsden in https://github.com/OpenAccess-AI-Collective/axolotl/pull/708
* fix(doc): Add note on inference w sample packing by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/712
* Fix: lowercase `True` values in config by atgctg in https://github.com/OpenAccess-AI-Collective/axolotl/pull/713
* fix(doc): update default doc according to arg by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/714
* Save Axolotl config as WandB artifact by jphme in https://github.com/OpenAccess-AI-Collective/axolotl/pull/716
* improve handling of the prepared ds path and other cfg defaults by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/701
* fix pytorch 2.1.0 build, add multipack docs by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/722
* add noisy embedding by maximegmd in https://github.com/OpenAccess-AI-Collective/axolotl/pull/721
* pin xformers >= 0.0.22 by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/724
* misc sharegpt fixes by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/723
* workaround for installing xformers w torch 2.1.0 by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/725
* tweak for xformers install w pytorch 2.1.0 by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/727
* fixes for alpaca w chatml, and don't include attention_mask w mistral for flash attention by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/728
* Clarify custom format example by casper-hansen in https://github.com/OpenAccess-AI-Collective/axolotl/pull/729
* Mistral: Sliding Window Attention with Flash Attention and Sample Packing by casper-hansen in https://github.com/OpenAccess-AI-Collective/axolotl/pull/732
* badge by mhenrichsen in https://github.com/OpenAccess-AI-Collective/axolotl/pull/739
* catch ConnectionError when checking dataset from HuggingFace by Napuh in https://github.com/OpenAccess-AI-Collective/axolotl/pull/743
* Fix(model): Linear detected and added to target module with rope linear by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/738
* improve: Enhance code readability of prompt_tokenizers.py by seungduk-yanolja in https://github.com/OpenAccess-AI-Collective/axolotl/pull/707
* add a latest tag for regular axolotl image, cleanup extraneous print statement by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/746
* Fix DeepSpeed Zero 3 Saving by tokestermw in https://github.com/OpenAccess-AI-Collective/axolotl/pull/709
* chore: bump transformers to v4.34.1 to fix tokenizer issue by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/745
* add to docs by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/703
* Implement fused modules by casper-hansen in https://github.com/OpenAccess-AI-Collective/axolotl/pull/747
* remove lora fused packing test by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/758
* Fix: eval table conflict with eval_sample_packing by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/769
* Fix: Cannot tokenize with bf16 and on cpu by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/766
* Hotfix for fused QKV not saving the trained weights of o_proj by casper-hansen in https://github.com/OpenAccess-AI-Collective/axolotl/pull/762
* convert exponential notation lr to floats by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/771
* Fix: Warn when fullfinetune without adapter by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/770
* simplify by removing duplicate base_model_config by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/772
* disable eval table w sample packing in examples by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/778
* refactor setup trainer so we can add more hooks by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/773
* chore: refactor truthy check and fix mypy by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/780
* chore(readme): Improve documentation on conversation field by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/782
* Threaded MultipackDistributedDataloader with prefetched samples by casper-hansen in https://github.com/OpenAccess-AI-Collective/axolotl/pull/759
* Create preprocess CLI by casper-hansen in https://github.com/OpenAccess-AI-Collective/axolotl/pull/785
* Add docker advanced instruction to README by gordicaleksa in https://github.com/OpenAccess-AI-Collective/axolotl/pull/792
* Fix Deepspeed Zero3 Config by teknium1 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/791
* Update to adapt to sharegpt datasets with "assistant" rather than "gp… by MilesQLi in https://github.com/OpenAccess-AI-Collective/axolotl/pull/774
* fix eval_steps to be a sane default by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/797
* refactor neft patch to be more re-usable similar to trl's impl by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/796
* fix(config): Set eos/bos to tokenizer if different by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/801
* feat(doc): add dummyoptim faq fix by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/802
* fix(tokenizer): update log order after update by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/806
* fix model parallel by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/816
* fix: pin autogptq by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/818
* update table for rwkv4 support, fix process count for dataset by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/822
* Feat: Added Gradio support by Stillerman in https://github.com/OpenAccess-AI-Collective/axolotl/pull/812
* Dockerfile: add deepspeed-kernels dependency for deepspeed>=0.12.0 by fpreiss in https://github.com/OpenAccess-AI-Collective/axolotl/pull/827
* cleanup verbosity a bit by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/799
* make sure to cleanup tmp output_dir for e2e tests by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/831
* multipack w batch sampler by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/795
* don't compile deepspeed or bitsandbytes from source by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/837
* Pin optimum package by brthor in https://github.com/OpenAccess-AI-Collective/axolotl/pull/838
* cleanup the old multipack dataloader by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/841
* include the suffix modified string in ascii art by fpreiss in https://github.com/OpenAccess-AI-Collective/axolotl/pull/852
* feat(doc): add more info on train_on_split by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/855
* chore(doc): Separate section on runpod by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/860
* various bugfixes by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/856
* adds llama and mistral dropout support by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/858
* multipack len should use max, not min by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/863
* Docs: add instructions to 1-click launching on public clouds by concretevitamin in https://github.com/OpenAccess-AI-Collective/axolotl/pull/862
* Update data.py for signature generation by MilesQLi in https://github.com/OpenAccess-AI-Collective/axolotl/pull/851
* lint fix that didn't get caught by linter by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/866
* make docker command more robust by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/861
* add e2e tests for checking functionality of resume from checkpoint by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/865
* allow overriding of model_config parameters from the YML by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/853
* Feat: Add dataset loading from S3, GCS by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/765
* try 2: pin hf transformers and accelerate to latest release, don't reinstall pytorch by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/867
* don't train if eval split is too small by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/873
* Phi update 202311 by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/876
* Install from git url by msaroufim in https://github.com/OpenAccess-AI-Collective/axolotl/pull/874
* fix: revert local dir dataset load by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/878
* chore(doc): Add info on changing role in sharegpt by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/886
* Feat: Add warmup_ratio by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/893
* fix: warning should not show if eval_batch_size not provided by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/896
* Feat: Add Qwen by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/894
* update datasets version to cut down the warnings due to pyarrow arg change by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/897
* fix: remove FA for qwen examples by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/900
* Determine FSDP/deepspeed settings on device select. by kallewoof in https://github.com/OpenAccess-AI-Collective/axolotl/pull/883
* ensure merged model matches the training dtype by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/902
* fix for qwen w lora by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/906
* Remove lr scheduler in DeepSpeed config to avoid conflict by Haoxiang-Wang in https://github.com/OpenAccess-AI-Collective/axolotl/pull/909
* feature: loss watchdog for terminating training runs that are failing by kallewoof in https://github.com/OpenAccess-AI-Collective/axolotl/pull/899
* Feat(wandb): Refactor to be more flexible by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/767
* Support device_map=sequential & max_memory config parameters by brthor in https://github.com/OpenAccess-AI-Collective/axolotl/pull/903
* feat: add check for quantized model by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/913
* Pin flash-attn to 2.3.3 by casper-hansen in https://github.com/OpenAccess-AI-Collective/axolotl/pull/919
* fix(tokenizer): handle fast tokenizer properly for bos/eos by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/914
* support for mamba by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/915
* fixing prompt template of chatml by removal of linebreak by timothylimyl in https://github.com/OpenAccess-AI-Collective/axolotl/pull/922
* Mixtral multipack by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/928
* update to latest transformers for mixstral support by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/929
* Mixtral: More correct MoE, lower loss by casper-hansen in https://github.com/OpenAccess-AI-Collective/axolotl/pull/932
* Update requirements.txt (fschat==0.2.34) by tokestermw in https://github.com/OpenAccess-AI-Collective/axolotl/pull/940
* Mixtral official by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/942
* Respect sequence_len in config for `type: llama2_chat` by hamelsmu in https://github.com/OpenAccess-AI-Collective/axolotl/pull/926
* new evals_per_epoch and saves_per_epoch to make things cleaner by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/944
* More hints on what to do with CUDA Out of memory errors by jooray in https://github.com/OpenAccess-AI-Collective/axolotl/pull/925
* fix: remove excessive newlines in system prompt(s) for alpaca by kallewoof in https://github.com/OpenAccess-AI-Collective/axolotl/pull/936
* Flash attn hotfix by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/951
* Fix Deepspeed loading by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/950
* fix: switch to using the HuggingFace Transformers NEFT implementation by kallewoof in https://github.com/OpenAccess-AI-Collective/axolotl/pull/941
* Add docs by hamelsmu in https://github.com/OpenAccess-AI-Collective/axolotl/pull/947
* Fix prompt assembly for llama by hamelsmu in https://github.com/OpenAccess-AI-Collective/axolotl/pull/952
* update transformers to fix checkpoint saving by dumpmemory in https://github.com/OpenAccess-AI-Collective/axolotl/pull/963
* update to latest nccl in docker image by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/965
* fix for build for nccl in dockerfile by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/970
* fix: add lr scheduler kwargs to Trainer by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/972
* Update README.md by eltociear in https://github.com/OpenAccess-AI-Collective/axolotl/pull/966
* Dockerfile torch fix by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/987
* fix mistral prompt assembly by hamelsmu in https://github.com/OpenAccess-AI-Collective/axolotl/pull/982
* Feat: Warns to add to modules_to_save when adding tokens or switching special_tokens by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/787
* Add tests to Docker by hamelsmu in https://github.com/OpenAccess-AI-Collective/axolotl/pull/993
* change val size by mhenrichsen in https://github.com/OpenAccess-AI-Collective/axolotl/pull/992
* chore: Update transformers to latest by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/986
* support for cuda 12.1 by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/989
* set output_router_logits for mixtral config: by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/995
* Add an example config for finetuning a 34B model on a 24GB GPU by evangriffiths in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1000
* FEAT: add tagging support to axolotl by younesbelkada in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1004
* Set eval_sample_packing to false in mistral config.yaml by kmsydney in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1003
* add config to model card by hamelsmu in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1005
* remove landmark attn and xpos rope implementations by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1010
* [Docs] Nit: clarify what inference is by hamelsmu in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1012
* [Docs] Nit: Remind people to auth to wandb if they are going to use it by hamelsmu in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1013
* feat: remove need to add load_in* during merge by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1017
* feat: expose bnb kwargs by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1018
* add ultrachat prompt strategies by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/996
* [WandB] Push axolotl config to top level wandb files by hamelsmu in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1014
* Adds chat templates by mhenrichsen in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1022
* Fix: bf16 support for inference by taziksh in https://github.com/OpenAccess-AI-Collective/axolotl/pull/981
* use recommended setting for use_reentrant w gradient checkpointing by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1021
* added tiny llama examples for lora and qlora by tdolan21 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1027
* chore(readme): update instruction to set config to load from cache by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1030
* [Docs] delete unused cfg value `lora_out_dir` by hamelsmu in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1029
* fix: lint by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1037
* chore(config): clean up old log for Qwen by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1034
* bump transformers and update attention class map name by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1023
* Added chatglm3 conversation type for training models like TinyLLama by xaviviro in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1036
* fix HF model card upload for PEFT models by hamelsmu in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1043
* Clean Up LorA Merge by hamelsmu in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1044
* feature: better device mapping for large models by kallewoof in https://github.com/OpenAccess-AI-Collective/axolotl/pull/918
* feat: always push checkpoint to hub if set by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1049
* Update tests-docker.yml by hamelsmu in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1052
* streaming multipack for pretraining dataset by jinwonkim93 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/959
* Simplify Docker Unit Test CI by hamelsmu in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1055
* Phi2 rewrite by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1058
* Efficiently get the length of the tokenized docs by RicardoDominguez in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1063
* Sponsors by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1065
* Update FUNDING.yml for Kofi link by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1067
* fix: torch_dtype mistral default to fp32 by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1050
* Cosine learning rate schedule - minimum learning rate by RicardoDominguez in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1062
* fix double eos token for chatml by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1054
* Add: mlflow for experiment tracking by JohanWork in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1059
* update peft to 0.7.0 by mtenenholtz in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1073
* paired kto support by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1069
* Separate AutoGPTQ dep to `pip install -e .[auto-gptq]` by casper-hansen in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1077
* attempt to also run e2e tests that needs gpus by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1070
* Update FUNDING.yml with bitcoin by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1079
* swap the data collator for evals if not using sample packing by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1076
* be more robust about checking embedding modules for lora finetunes by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1074
* fix: `train_on_inputs: true` ignored for sharegpt by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1045
* update sharegpt conversations when chatml chat template is set by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1075
* additional logging to get maximum token length of a sequence in the dataset by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1066
* pin accelerate for deepspeed fix by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1080
* fix: warn user to install mamba_ssm package by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1019
* use tags again for test image, only run docker e2e after pre-commit checks by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1081
* optimize calculation of cu_seqlens from position_ids by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1084
* add python 3.11 to the matrix for unit tests by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1085
* Remove fused-dense-lib from requirements.txt by casper-hansen in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1087
* misc fixes from 943 by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1086
* add gptneox embeddings, fix phi2 inputs, also fix the casting by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1083
* Add Debugging Guide by hamelsmu in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1089
* Fix debugging.md by hamelsmu in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1091
* feat: enable trl's autounwrap by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1060
* Fix broken pypi.yml by msaroufim in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1099
* Update README.md by hamelsmu in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1103
* Add section for debugging with Docker by hamelsmu in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1104
* Add link on README to Docker Debugging by hamelsmu in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1107
* keep gate in fp32 for loras by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1105
* Fix debugging video by hamelsmu in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1111
* Disable caching on `--disable_caching` in CLI by casper-hansen in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1110
* Reverse caching PR by casper-hansen in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1115
* Enable or disable bf16 support based on availability by simhallq in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1116
* update PR template so we can capture twitter or discord handles by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1121
* pin model_revision for phi2 by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1123
* fix(readme): clarify custom user prompt [no-ci] by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1124
* Add `layers_to_transform` for `lora_config` by xzuyn in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1118
* Agnostic cloud gpu docker image and Jupyter lab by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1097
* Preprocess dataset size fix by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1131
* fix(preprocess): Make sure dataset not loaded from cache when using preprocess cli by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1136
* fix bf16 check when preprocessing data by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1140
* Add shifted sparse attention by joecummings in https://github.com/OpenAccess-AI-Collective/axolotl/pull/973
* Multipack simplify for Mixtral by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1142
* Fix link for Minotaur model by joecummings in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1146
* Dockerfile cloud ports by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1148
* fix check for env var by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1151
* feat(dataset): add config to keep processed dataset in memory by NanoCode012 in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1152
* Deprecate max packed sequence len by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1141
* make sure the model config loader respects the model_revision too by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1160
* Qwen2 by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1166
* jupyter lab fixes by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1139
* set fp16 to false if bf16, update bf16: auto in example YAMLs by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1122
* Add mlflow callback for pushing config to mlflow artifacts by JohanWork in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1125
* improve vram use w gradient checkpointing by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1167
* Vram fix attempt by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1164
* add commit message option to skip docker image builds in ci by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1168
* Falcon embeddings by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1149
* support for explicit test_dataset definition for evals by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/786
* Add desc to map/filter by casper-hansen in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1162
* Feat(test): Add tests for alpaca chatml prompt tokenizer by JohanWork in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1088
* DPO cleanup by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1126
* Update README.md by singhay in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1169
* Fine-Tuning Mistral-7b for Real-World Chatbot Applications Using Axolotl (Lora used) by Tilemachoc in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1155
* don't fail if can't cast weights due to offload when merging by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1172
* update docs by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1176
* Phi2 multipack by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1173
* DPO fixes v2 by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1174
* Docs: RLHF Update after cleanup by AlekseyKorshuk in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1178
* Add support for offline mode with HF_HUB_OFFLINE envvar by JamesHWade in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1182
* Fix do_merge_lora raises an Exception in transformers v4.37.0 by tisorlawan in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1184
* report min lenght of tokenized data by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1186
* more dpo fixes for dataset loading and docs by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1185
* upgrade deepspeed to 0.13.1 for mixtral fixes by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1189
* Standardize system prompt format for AlpacaPrompter (instruct case) by sadaisystems in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1190
* Mixtral fixes 20240124 by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1192
* prepare for release v0.4.0 by winglian in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1175
New Contributors
* Kimiko-AI made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/582
* bofenghuang made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/606
* Psancs05 made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/610
* Nan-Do made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/632
* felixonmars made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/639
* Napuh made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/645
* adarshxs made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/647
* ein-ich made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/652
* corbt made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/651
* TokenBender made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/675
* seungduk-yanolja made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/700
* lukemarsden made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/708
* atgctg made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/713
* casper-hansen made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/729
* tokestermw made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/709
* gordicaleksa made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/792
* MilesQLi made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/774
* Stillerman made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/812
* fpreiss made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/827
* brthor made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/838
* concretevitamin made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/862
* msaroufim made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/874
* kallewoof made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/883
* Haoxiang-Wang made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/909
* timothylimyl made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/922
* hamelsmu made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/926
* jooray made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/925
* dumpmemory made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/963
* eltociear made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/966
* evangriffiths made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1000
* younesbelkada made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1004
* kmsydney made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1003
* taziksh made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/981
* tdolan21 made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1027
* xaviviro made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1036
* jinwonkim93 made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/959
* RicardoDominguez made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1063
* JohanWork made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1059
* mtenenholtz made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1073
* simhallq made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1116
* xzuyn made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1118
* joecummings made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/973
* singhay made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1169
* Tilemachoc made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1155
* AlekseyKorshuk made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1178
* JamesHWade made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1182
* tisorlawan made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1184
* sadaisystems made their first contribution in https://github.com/OpenAccess-AI-Collective/axolotl/pull/1190
**Full Changelog**: https://github.com/OpenAccess-AI-Collective/axolotl/compare/v0.3.0...v0.4.0