Aphrodite-engine

Latest version: v0.6.5

Safety actively analyzes 723177 Python packages for vulnerabilities to keep your Python projects secure.

Page 2 of 6

0.6.2

What's Changed
* feat: FP8 quantization support for AMD ROCm by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/729
* feat: add experts_int8 support by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/730
* chore: move update_flash_attn_metadata to attn backend by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/731
* chore: register lora functions as torch ops by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/732
* feat: dynamo support for ScalarType by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/733
* fix: types in AQLM and GGUF for dynamo support by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/736
* fix: `custom_ar` check by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/737
* fix: clear engine ref in RPC server by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/738
* fix: use nvml to get consistent device names by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/739
* feat: add Exaone model support by shing100 in https://github.com/PygmalionAI/aphrodite-engine/pull/743
* fix: minor bug fixes & clean-ups by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/744
* chore: refactor `MultiModalConfig` initialization and profiling by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/745
* chore: various TPU fixes and optimizations by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/746
* fix: metrics endpoint with RPC server by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/747
* chore: refactor llama3 rope by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/748
* feat: add XTC Sampling by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/740
* ci: fix dep install using pnpm by ahme-dev in https://github.com/PygmalionAI/aphrodite-engine/pull/749
* ci: fix docs deployment by ahme-dev in https://github.com/PygmalionAI/aphrodite-engine/pull/750
* chore: re-enable custom token bans by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/751
* feat: bring back dynatemp by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/754
* feat: quant_llm support by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/755
* fix: add pandas to requirements by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/756
* docs: update readme and quant docs by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/757
* ci: bump version to 0.6.2 by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/758

New Contributors
* shing100 made their first contribution in https://github.com/PygmalionAI/aphrodite-engine/pull/743
* ahme-dev made their first contribution in https://github.com/PygmalionAI/aphrodite-engine/pull/749

**Full Changelog**: https://github.com/PygmalionAI/aphrodite-engine/compare/v0.6.1.post1...v0.6.2

0.6.1.post1

What's Changed
* chore: register custom torch ops for flash-attn and flashinfer by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/724
* feat: launch API server with uvloop by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/725
* chore: fix return statement in `Detokenizer.decode_sequence_inplace` by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/727
* Fix tensor parallelism, libcudart path for some versions of pytorch by miku448 in https://github.com/PygmalionAI/aphrodite-engine/pull/726
* ci: bump to 0.6.1.post1 by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/728

**Full Changelog**: https://github.com/PygmalionAI/aphrodite-engine/compare/v0.6.1...v0.6.1.post1

0.6.1

What's Changed
* ci: exclude cu118 from build and add py_limited_api by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/639
* fix: better async request cancellation by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/641
* fix: gracefully handle missing chat template by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/642
* chore: deduplicate nvlink check to cuda platform by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/643
* fix: hardcoded float16 in embedding mode check by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/645
* quadratic sampling: separate diff from logits to filter out NaNs. by 50h100a in https://github.com/PygmalionAI/aphrodite-engine/pull/644
* fix: RSLoRA support by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/647
* feat: introduce `BaseAphroditeParameter` by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/646
* fix: move zeromq rpc frontend to IPC instead of TCP by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/652
* fix: input processor in internvl2 by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/653
* fix: multiprocessing timeout by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/654
* fix: GPTQ/AWQ on Colab by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/655
* fix: make `merge_async_iterators.is_cancelled()` optional by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/656
* fix: flashinfer outputs by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/657
* fix: max_num_batched_tokens should not be limited for lora by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/658
* fix: lora with pipeline parallel by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/659
* fix: kill api server when pinging dead engine by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/660
* fix: `get_num_blocks_touched` logic by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/661
* chore: update the env.py script and the bug report template by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/662
* feat: add INT8 W8A16 quant for TPU by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/663
* feat: allow serving encoder-decoder models in the API server by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/664
* fix: deps with TPU dockerfile by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/665
* optimization: reduce end-to-end overhead from python obj allocation by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/666
* fix: minor adjustments to scheduler and block manager by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/667
* feat: enable using fp8 kv and prefix caching with chunked prefill by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/668
* fix: mlpspeculator with padded vocab by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/669
* feat: option to apply temperature scaling last by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/670
* chore: decouple `should_modify_greedy_probs_inplace` by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/671
* chore: better stream termination in async engine by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/672
* chore: mamba cache single buffer by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/673
* feat: mamba model support by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/674
* fix: reinit procedure in `ModelInputForGPUBuilder` by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/675
* feat: embeddings support for batched OAI endpoint by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/676
* fix: fp8 checkpoints with fused linear modules by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/677
* feat: add numpy implementation of `compute_slot_mapping` by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/678
* fix: chunked prefill with v2 block manager by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/679
* fix: phi3v batch inference with different aspect ratio images by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/680
* chore: use mark_dynamic to reduce TPU compile times by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/681
* chore: bump lmfe to v0.10.6 and include triton for tpu and xpu dockefiles by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/682
* refactor: base worker input refactor for multi-step by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/683
* build: add empty device by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/684
* chore: update flashinfer to v0.1.3 by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/685
* feat: allow image embeddings for VLM input by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/686
* feat: add progress bar for loading individual weight modules by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/640
* chore: use public ECR for neuron image by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/687
* fix: logit softcapping in flash-attn by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/688
* chore: use scalar type to dispatch to different `gptq_marlin` kernels by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/689
* fix: allow passing float for GiB arguments by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/690
* build: bump cmake to 3.26 by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/691
* fix: shut down ray dag workers cleanly by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/692
* feat: add lora loading/unloading api endpoint by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/693
* feat: add load/unload endpoints for soft-prompts by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/694
* fix: loading chameleon model with TP>1 by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/695
* fix: consolidated `is_tpu()` and suppress tpu import warning by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/696
* fix: manually install triton for other devices to prevent outlines errors by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/697
* feat: support for Audio modality by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/698
* chore: migrate gptq_marlin to AphroditeParameters by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/699
* chore: update fused MoE weight loading by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/700
* feat: add Solar model support by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/701
* feat: migrate awq and awq_marlin to AphroditeParameter by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/702
* chore: spawn engine process from api server process by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/703
* chore: use the `compressed-tensors` library to avoid code reuse by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/704
* feat: add aphrodite plugin system by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/705
* Revert "chore: use the `compressed-tensors` library to avoid code reuse (704)" by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/706
* feat: add support for multi-host TPU by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/707
* fix: import ray under a guard by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/708
* fix: empty sampler output when temperature is too low by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/709
* fix: disable embeddings API for chat models by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/710
* feat: implement mistral tokenizer mode by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/711
* feat: support profiling with multiple multi-modal inputs per prompt by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/712
* chore: multi-step args and sequence modifications by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/713
* chore: set per-rank XLA cache for TPU by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/714
* chore: add support for up to 2048 block size by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/715
* fix: install protobuf for cpu by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/716
* fix: weight loading for scalars by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/718
* chore: quant config for speculative draft models by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/719
* feat: enable prompt logprobs in OpenAI API by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/720
* chore: update grafana template by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/721
* ci: bump aphrodite to 0.6.1 by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/722

**Full Changelog**: https://github.com/PygmalionAI/aphrodite-engine/compare/v0.6.0.post1...v0.6.1

0.6.0.post1

What's Changed
* feat: add siglip encoder for llava family by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/626
* readme: fix model name typo by Trapper4888 in https://github.com/PygmalionAI/aphrodite-engine/pull/627
* feat: multi-image input for minicpmv by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/628
* feat: Add support for GPU device selection in SpecDecodeBaseSampler by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/629
* feat: per-tensor token epilogue kernels by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/630
* chore: optimize evictor v2 performance by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/631
* feat: initial encoder-decoder support with BART model by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/633
* fix: default api port and attention selector by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/634
* fix: clean up incorrect log in worker by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/636
* bump to v0.6.0.post1 by AlpinDale in https://github.com/PygmalionAI/aphrodite-engine/pull/635

New Contributors
* Trapper4888 made their first contribution in https://github.com/PygmalionAI/aphrodite-engine/pull/627

**Full Changelog**: https://github.com/PygmalionAI/aphrodite-engine/compare/v0.6.0...v0.6.0.post1

0.6.0

New Contributors
* matthusby made their first contribution in https://github.com/PygmalionAI/aphrodite-engine/pull/538

**Full Changelog**: https://github.com/PygmalionAI/aphrodite-engine/compare/v0.5.3...v0.6.0

0.5.3

What's Changed
A new release, one that took too long again. We have some cool new features, however.

- **ExllamaV2 tensor parallel**: You can now run ExllamaV2 quantized models on multiple GPUs. This should be the fastest multi-gpu experience with exllamav2 models.
- **Support for Command-R+**
- **Support for DBRX**
- **Support for Llama-3**
- **Support for Qwen 2 MoE**
- **`min_tokens` sampling param**: You can now set a minimum amount of tokens to generate.
- **Fused MoE for AWQ and GPTQ quants**: AWQ and GPTQ kernels have been updated with optimized fused MoE code. They should be significantly faster now.
- **CMake build system**: Slightly faster, much cleaner builds.
- **CPU support**: You can now run aphrodite on CPU only systems! Needs an AVX512-compatible CPU for now.
- **Speculative Decoding**: Speculative Decoding is finally here! You can either use a draft model, or use prompt lookup decoding with an ngram model (built-in).
- **Chunked Prefill**: Before this, Aphrodite would process prompts in chunks equal to the model's context length. Now, you can enable this option (via `--enable-chunked-prefill`) to process in chunks of 768 by default, massively increasing the amount of context you can fit. Does not currently work with context shift or FP8 KV cache.
- **Context Shift reworked**: Context shift finally works now. Enable it with `--context-shift` and Aphrodite will cache processed prompts and re-use them.
- **FP8 E4M3 KV Cache**: This is for ROCm only. Support will be extended to NVIDIA soon. E4M3 has higher quality compared to E5M2, but doesn't lead to any throughput increase.
- **Auto-truncation in API**: The API server can now optionally left-truncate your prompts. Simply pass `truncate_prompt_tokens=1024` to truncate any prompt larger than 1024 tokens.
- **Support for Llava vision models**: Currently 1.5 is supported. With the next release, we should have 1.6 along with a proper GPT4-V compatible API.
- **LM Format Enforcer**: You can now use LMFE for guided generations.
- **EETQ Quantization**: EETQ support has been added - a SOTA 8bit quantization method.
- **Arbitrary GGUF model support**: We were limited to only Llama models for GGUF, now any GGUF is supported. You will need to convert the model beforehand for them, however.
- **Aphrodite CLI app**: You no longer have to type `python -m aphrodite...`. Simply type `aphrodite run meta-llama/Meta-Llama-3-8B` to get started. Pass extra flags as normal.
- **Sharded GGUF support**: You can now load sharded GGUF models. Pre-conversion needed.
- **NVIDIA P100/GP100 support**: Support has been restored.

Thanks to all the new contributors!

**Full Changelog**: https://github.com/PygmalionAI/aphrodite-engine/compare/v0.5.2...v0.5.3

Page 2 of 6

Releases

Has known vulnerabilities

Previous Next

Aphrodite-engine

Page 2 of 6

0.6.2

0.6.1.post1

0.6.1

0.6.0.post1

0.6.0

0.5.3

Page 2 of 6

Links

Releases