What's Changed
* misc: remove legacy logic to support quantization for other types. by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/350
* upgrade pytorch to 2.5.1 by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/351
* added cuda 12.6 build image by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/353
* fix cmake version issue for manylinux image by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/354
* kernel: added attention kernel for sm80 (Happy new year!) by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/355
* ci: fix package test workflow by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/357
* kernel: refactor attention kernel for readibility by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/358
* dev: config dev container with proper extensions by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/359
* kernel: added attention bench for profiling before optimization by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/360
* kernel: added logits soft cap support for attention by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/362
* tools: added attention traits viewer by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/363
* kernel: added swizzle for shared memory to avoid bank conflict by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/364
* kernel: added causal, alibi, sliding window mask for attention by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/365
* kernel: refactor attention kernel and add more unittests by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/366
* kernel: added M/N OOB handling for attention by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/367
* tools: update svg build to generate small file by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/368
* kernel: Added attention params and tile for different input types. by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/369
* kernel: added mqa and gqa support for attention by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/370
* kernel: added var len and paged kv cache support for attention by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/371
* kernel: added varlen and pagedkv unittests for attention by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/372
* kernel: added attention kernel launch by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/373
* kernel: added build script to generate kernel instantiations for attention by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/374
* kernel: change attention input shape from [head, seq, dim] to [seq, head, dim] by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/375
* kernel: added head_dim=96 support for attention by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/376
* kernel: optimize attention kernel performance by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/377
* upgrade cutlass to 3.7.0 by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/379
* kernel: handle kv block range for attention kernel by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/382
* kernel: use cp_async_zfill instead of cute::clear for oob handling by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/383
* kernel: seperate oob iterations for better performance. by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/384
* refactor: remove batch_prefill interface by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/385
* refactor: stop build flash_infer kernel by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/386
* feat: integrate in-house scale attention and use it by default by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/380
* kernel: only zfill k once to improve perf for attention by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/387
* refactor: skip flash_attn build by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/388
* refactor: clean up kv cache set/get apis and improve slot id calculation perf by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/389
**Full Changelog**: https://github.com/vectorch-ai/ScaleLLM/compare/v0.2.2...v0.2.3