Scalellm

Latest version: v0.2.4

Safety actively analyzes 723200 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 4

0.2.3

What's Changed
* misc: remove legacy logic to support quantization for other types. by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/350
* upgrade pytorch to 2.5.1 by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/351
* added cuda 12.6 build image by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/353
* fix cmake version issue for manylinux image by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/354
* kernel: added attention kernel for sm80 (Happy new year!) by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/355
* ci: fix package test workflow by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/357
* kernel: refactor attention kernel for readibility by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/358
* dev: config dev container with proper extensions by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/359
* kernel: added attention bench for profiling before optimization by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/360
* kernel: added logits soft cap support for attention by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/362
* tools: added attention traits viewer by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/363
* kernel: added swizzle for shared memory to avoid bank conflict by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/364
* kernel: added causal, alibi, sliding window mask for attention by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/365
* kernel: refactor attention kernel and add more unittests by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/366
* kernel: added M/N OOB handling for attention by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/367
* tools: update svg build to generate small file by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/368
* kernel: Added attention params and tile for different input types. by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/369
* kernel: added mqa and gqa support for attention by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/370
* kernel: added var len and paged kv cache support for attention by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/371
* kernel: added varlen and pagedkv unittests for attention by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/372
* kernel: added attention kernel launch by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/373
* kernel: added build script to generate kernel instantiations for attention by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/374
* kernel: change attention input shape from [head, seq, dim] to [seq, head, dim] by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/375
* kernel: added head_dim=96 support for attention by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/376
* kernel: optimize attention kernel performance by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/377
* upgrade cutlass to 3.7.0 by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/379
* kernel: handle kv block range for attention kernel by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/382
* kernel: use cp_async_zfill instead of cute::clear for oob handling by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/383
* kernel: seperate oob iterations for better performance. by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/384
* refactor: remove batch_prefill interface by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/385
* refactor: stop build flash_infer kernel by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/386
* feat: integrate in-house scale attention and use it by default by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/380
* kernel: only zfill k once to improve perf for attention by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/387
* refactor: skip flash_attn build by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/388
* refactor: clean up kv cache set/get apis and improve slot id calculation perf by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/389

**Full Changelog**: https://github.com/vectorch-ai/ScaleLLM/compare/v0.2.2...v0.2.3

0.2.2

What's Changed
* kernel: added flash infer attention impl by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/327
* refactor: flatten block tables to 1d tensor by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/328
* kernel: added script to generate instantiation for flashinfer kernels by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/329
* refactor: move flash attn and flash infer into attention folder by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/330
* kernel: port flash infer handler + wrapper logics by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/331
* ut: added unittests for flash infer kernels by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/332
* refactor: replaced last_page_len with kv_indptr for flash infer kernel by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/333
* feat: added pass-in alibi slopes support for flash infer kernel by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/334
* refactor: move paged kv related logic into paged_kv_t by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/335
* ut: added fp8 kv unittests for flash infer kernel by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/336
* ci: added pip cache to avoid redownloading by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/337
* upgrade pytorch to 2.4.1 by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/341
* ci: run package test in docker by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/345
* ci: build cuda 12.4 for scalellm cpp images by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/346
* Upgrade pytorch to 2.5.0 by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/347
* ut: add more tests for different warp layout by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/340
* misc: attention kernel refactoring by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/339

**Full Changelog**: https://github.com/vectorch-ai/ScaleLLM/compare/v0.2.1...v0.2.2

0.2.1

What's Changed
* feat: added awq marlin qlinear by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/315
* build: speed up compilation for marlin kernels by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/316
* test: added unittests for marlin kernels by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/317
* refactor: clean up build warnings and refactor marlin kernels by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/318
* fix: clean up build warnings: "LOG" redefined by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/319
* cmake: make includes private and disable jinja2cpp build by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/320
* ci: allow build without requiring a physical gpu device by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/321
* fix: put item into asyncio.Queue in a thread-safe way by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/324
* refactor: added static switch for marlin kernel dispatch by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/325
* feat: fix and use marlin kernel for awq by default by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/326

**Full Changelog**: https://github.com/vectorch-ai/ScaleLLM/compare/v0.2.0...v0.2.1

0.2.0

What's Changed
* kernel: port softcap support for flash attention by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/298
* test: added unittests for attention sliding window by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/299
* model: added gemma2 with softcap and sliding window support by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/300
* kernel: support kernel test in python via pybind by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/301
* test: added unittests for marlin fp16xint4 gemm by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/302
* fix: move eos out of stop token list to honor ignore_eos option by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/305
* refactor: move models to upper folder by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/306
* kernel: port gptq marlin kernel and fp8 marlin kernel by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/307
* rust: upgrade rust libs to latest version by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/309
* refactor: remove the logic loading individual weight from shared partitions by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/311
* feat: added fused column parallel linear by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/313
* feat: added gptq marlin qlinear layer by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/312
* kernel: port awq repack kernel by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/314

**Full Changelog**: https://github.com/vectorch-ai/ScaleLLM/compare/v0.1.9...v0.2.0

0.1.9

What's Changed
* ci: cancel all previous runs if a new one is triggered by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/283
* pypi: fix invalid classifier by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/284
* refactor: remove exllama kernels by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/285
* kernel: added marlin dense and sparse kernels by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/287
* debug: added environment collection script. by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/288
* kernel: added triton kernel build support by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/289
* feat: added THUDM/glm-4* support by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/292
* fix: handle unfinished utf8 bytes for tiktoken tokenizer by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/293
* triton: fix build error and add example with unittest by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/294
* model: added qwen2 support by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/295
* feat: added sliding window support for QWen2 by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/296
* ci: fix pytest version to avoid flakiness by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/297

**Full Changelog**: https://github.com/vectorch-ai/ScaleLLM/compare/v0.1.8...v0.1.9

0.1.8

What's Changed
* ci: increase ccache max size from 5GB(default) to 25GB by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/279
* upgrade torch to 2.4.0 by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/280
* default use cuda 12.1 for wheel package by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/281
* ci: fix cuda version for wheel build workflow by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/282

**Full Changelog**: https://github.com/vectorch-ai/ScaleLLM/compare/v0.1.7...v0.1.8

Page 1 of 4

Releases

Has known vulnerabilities

Scalellm

Page 1 of 4

0.2.3

0.2.2

0.2.1

0.2.0

0.1.9

0.1.8

Page 1 of 4

Links

Releases