What's Changed
* Flash Attention V2 w/ arbitrary attention bias by drisspg in https://github.com/drisspg/transformer_nuggets/pull/1
* Update torch.cuda.memory API calls for memory profiling by janeyx99 in https://github.com/drisspg/transformer_nuggets/pull/4
* updated by drisspg in https://github.com/drisspg/transformer_nuggets/pull/6
* Simple Fp8 delayed scaling kernel by drisspg in https://github.com/drisspg/transformer_nuggets/pull/7
* use ufmt on prs by drisspg in https://github.com/drisspg/transformer_nuggets/pull/8
* Add Llama Training scripts by drisspg in https://github.com/drisspg/transformer_nuggets/pull/10
* Pre commit by drisspg in https://github.com/drisspg/transformer_nuggets/pull/11
* add_nan_inf_detect_mode by drisspg in https://github.com/drisspg/transformer_nuggets/pull/12
* enable qlora finetuning on single GPU by weifengpy in https://github.com/drisspg/transformer_nuggets/pull/13
* added qlora + fsdp by weifengpy in https://github.com/drisspg/transformer_nuggets/pull/14
* alll the flake8s by drisspg in https://github.com/drisspg/transformer_nuggets/pull/16
* fix_tests by drisspg in https://github.com/drisspg/transformer_nuggets/pull/17
* Make Nf4 a NF4 Tensor subclass by drisspg in https://github.com/drisspg/transformer_nuggets/pull/18
* enable per-parameter-sharding FSDP + qlora by weifengpy in https://github.com/drisspg/transformer_nuggets/pull/15
* Add op table for torch dispatch by drisspg in https://github.com/drisspg/transformer_nuggets/pull/22
* fix qlora mlp bug and add script for getting memory traces by drisspg in https://github.com/drisspg/transformer_nuggets/pull/23
* Add ShapeLog mode to utilities by drisspg in https://github.com/drisspg/transformer_nuggets/pull/25
* Remove dtype restriction and test by drisspg in https://github.com/drisspg/transformer_nuggets/pull/26
* Block mask by drisspg in https://github.com/drisspg/transformer_nuggets/pull/3
* Dynamic scaling triton kernel by drisspg in https://github.com/drisspg/transformer_nuggets/pull/28
* Allow for score mod and change of base perf trick by drisspg in https://github.com/drisspg/transformer_nuggets/pull/29
* Updates to ruff by drisspg in https://github.com/drisspg/transformer_nuggets/pull/32
* import nanif to init by drisspg in https://github.com/drisspg/transformer_nuggets/pull/33
* add doctsring to profiler by drisspg in https://github.com/drisspg/transformer_nuggets/pull/34
* Add some utils for working with flex by drisspg in https://github.com/drisspg/transformer_nuggets/pull/35
New Contributors
* janeyx99 made their first contribution in https://github.com/drisspg/transformer_nuggets/pull/4
* weifengpy made their first contribution in https://github.com/drisspg/transformer_nuggets/pull/13
**Full Changelog**: https://github.com/drisspg/transformer_nuggets/commits/v0.0.1