What's Changed
* kernel: port softcap support for flash attention by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/298
* test: added unittests for attention sliding window by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/299
* model: added gemma2 with softcap and sliding window support by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/300
* kernel: support kernel test in python via pybind by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/301
* test: added unittests for marlin fp16xint4 gemm by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/302
* fix: move eos out of stop token list to honor ignore_eos option by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/305
* refactor: move models to upper folder by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/306
* kernel: port gptq marlin kernel and fp8 marlin kernel by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/307
* rust: upgrade rust libs to latest version by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/309
* refactor: remove the logic loading individual weight from shared partitions by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/311
* feat: added fused column parallel linear by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/313
* feat: added gptq marlin qlinear layer by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/312
* kernel: port awq repack kernel by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/314
**Full Changelog**: https://github.com/vectorch-ai/ScaleLLM/compare/v0.1.9...v0.2.0