Automatically generated release for tag v0.2.0.
🚀 New Features Highlights
- **Distributed KV Cache**: Implemented support for managing KV cache across multiple nodes, enhancing performance.
- **Cost-Driven Heterogenous Serving**: Improved scheduling and inference strategies for mixed GPU environments, optimizing cost and resource utilization. (371 430, 509, 598, 554, 598)
- **Optimizer Based Autoscaling**: Leverage offline profiles of inference server to calculate the number of replicas. (430, 500, 692, 508)
- **Prefix Cache Aware Routing**: Added support for routing decisions based on prefix cache hits, improving inference efficiency. (641, 657)
đź“Š Feature Enhancements
- **LoRA Scheduling Enhancements**: Introduced multiple scheduling strategies, including bin packing, least latency, least throughput, and random. (544)
- **Prefix Cache Aware Routing**: Added support for routing decisions based on prefix cache hits, improving inference efficiency. (641)
- **Gateway Enhancements**: Improved request handling efficiency by enabling streaming in the Envoy gateway. (377) Enhanced the handling of model registration and invalid cache scenarios. (542), Introduced fallback strategies to ensure robust request allocation. (445) Optimized cache store retrieval, reducing unnecessary overhead. (639) Addressed missing Prometheus config preventing gateway startup. (441)
- **PodAutoscaler Scaling improvements**: Improved scaling logic to handle edge cases more efficiently. (508, 515)
đź› Infrastructure & CI/CD Upgrades
- Parallelized Build Tasks: CI efficiency improvements by running builds in parallel. (398)
- CrashLoopBackOff Detection in CI: Added monitoring for pod failures in testing workflows. (444)
- Improved GitHub Actions Cost Efficiency: Optimized triggers and removed unnecessary nightly builds. (411, 422)
- Integration Tests for Core Components: Added integration tests for autoscalers, routing policies, and deployment configurations. (616, 620)
What's Changed
* Add envoy gateway streaming support by varungup90 in https://github.com/aibrix/aibrix/pull/377
* Add client traffic policy to increase per connection buffer size from 32kb to 256kb by varungup90 in https://github.com/aibrix/aibrix/pull/395
* Misc: add support to metricsSources property of podautoscaler by zhangjyr in https://github.com/aibrix/aibrix/pull/371
* [Misc] Update runtime server startup command in v0.1.0 by brosoul in https://github.com/aibrix/aibrix/pull/396
* [CI] improve the ci efficiency by parallelizing the build tasks by nwangfw in https://github.com/aibrix/aibrix/pull/398
* Fix the ticker interval by removing unnecessary ms by Jeffwan in https://github.com/aibrix/aibrix/pull/415
* [Misc] Disable specific endpoints logs by Jeffwan in https://github.com/aibrix/aibrix/pull/418
* [CI] Github Action trigger condition optimized for cost saving by nwangfw in https://github.com/aibrix/aibrix/pull/411
* [Misc] Fix the mocked app role permission issue by Jeffwan in https://github.com/aibrix/aibrix/pull/416
* [CI] Nightly tag removed for release branch by nwangfw in https://github.com/aibrix/aibrix/pull/422
* Enable setting PodAutoscaler configuration via YAML labels by kr11 in https://github.com/aibrix/aibrix/pull/409
* Update manifest to adopt v0.1.1 images by Jeffwan in https://github.com/aibrix/aibrix/pull/429
* [Bug]: duplicated http in rest metrics fetcher (408) by zhangjyr in https://github.com/aibrix/aibrix/pull/421
* [MISC]: Improve Request Trace Granularity with Version Control by zhangjyr in https://github.com/aibrix/aibrix/pull/431
* Support histogram metrics from engine in cache by Jeffwan in https://github.com/aibrix/aibrix/pull/424
* Support fetching metrics from remote Prometheus server by Jeffwan in https://github.com/aibrix/aibrix/pull/433
* [CI] Add python wheel to release artifact by Jeffwan in https://github.com/aibrix/aibrix/pull/434
* Fix update cache pod issue and refactor updatePod handler by Jeffwan in https://github.com/aibrix/aibrix/pull/439
* Extract common metrics structure to types and utils by Jeffwan in https://github.com/aibrix/aibrix/pull/438
* Fix gateway startup issue due to missing prometheus config by Jeffwan in https://github.com/aibrix/aibrix/pull/441
* [feat]: GPU Optimizer and Simulator development app by zhangjyr in https://github.com/aibrix/aibrix/pull/430
* Add selectrandom fallback in routing and only scraping healthy pods by Jeffwan in https://github.com/aibrix/aibrix/pull/445
* AIBrix Workload Generator / Scenario Simulator by happyandslow in https://github.com/aibrix/aibrix/pull/428
* CrashLoopBackOff status detection in CI by nwangfw in https://github.com/aibrix/aibrix/pull/444
* Support installing individual controllers from giant controller-manager by nwangfw in https://github.com/aibrix/aibrix/pull/442
* Refactor Scaler: Resolve Issues with Metric Parameter Updates in Multiple KPAs by kr11 in https://github.com/aibrix/aibrix/pull/437
* Support metrics multi labels for different models by brosoul in https://github.com/aibrix/aibrix/pull/450
* Add health check api interface for runtime by Jeffwan in https://github.com/aibrix/aibrix/pull/451
* Fix the service name override issue in rolebindings by Jeffwan in https://github.com/aibrix/aibrix/pull/453
* Reorganize docs/development and docs/tutorial structure by Jeffwan in https://github.com/aibrix/aibrix/pull/455
* Move tools to separate folders and update mocked app README.md by Jeffwan in https://github.com/aibrix/aibrix/pull/457
* Fix multi models metric result in PromQL by brosoul in https://github.com/aibrix/aibrix/pull/458
* Support Azure LLM trace in workload generator by happyandslow in https://github.com/aibrix/aibrix/pull/462
* Fix autoscaler scalingstrategy switching logic by nwangfw in https://github.com/aibrix/aibrix/pull/475
* Fix missing handle of PromQL scope is PodMetricScope by brosoul in https://github.com/aibrix/aibrix/pull/479
* [Misc] Consolidate app and simulator by zhangjyr in https://github.com/aibrix/aibrix/pull/477
* [Bug] Avoid including sensitive info in Dockerfile ENV by zhangjyr in https://github.com/aibrix/aibrix/pull/487
* Refactor generator to generate time-based traces by happyandslow in https://github.com/aibrix/aibrix/pull/478
* [CI] Update deploy workload script in installation test by nwangfw in https://github.com/aibrix/aibrix/pull/499
* [Bug] handle metricKey creation with MetricsSources by nwangfw in https://github.com/aibrix/aibrix/pull/498
* Adding Client for Workload Generator Workload File by happyandslow in https://github.com/aibrix/aibrix/pull/501
* [Feat] Integrate deployment configurations and fix autoscaler/gpu optimizer connectivity by zhangjyr in https://github.com/aibrix/aibrix/pull/500
* Fix some simulator format issue and add some TODOs by Jeffwan in https://github.com/aibrix/aibrix/pull/505
* [Bug] Fix the way how podautoscaler handle 0 pods. by zhangjyr in https://github.com/aibrix/aibrix/pull/508
* [Misc] Improve gpu optimizer debugging on podautoscaler. by zhangjyr in https://github.com/aibrix/aibrix/pull/509
* Optimize kustomize overlay for volcano engine deployment by Jeffwan in https://github.com/aibrix/aibrix/pull/512
* [perf] Refact tos downloader in Runtime by brosoul in https://github.com/aibrix/aibrix/pull/510
* Refactor metric source for customized protocol, port and path by kr11 in https://github.com/aibrix/aibrix/pull/511
* [Bug] Fixed the yaml of deployments in heterogenous GPU settings to make KPA scaling work as expected. by zhangjyr in https://github.com/aibrix/aibrix/pull/513
* [Misc] Heterogeneous GPU Optimizer Logging Clean Up by nwangfw in https://github.com/aibrix/aibrix/pull/514
* Fix KPA bug, and an elaborate KPA test case by kr11 in https://github.com/aibrix/aibrix/pull/515
* Cut v0.2.0-rc.1 release by Jeffwan in https://github.com/aibrix/aibrix/pull/516
* [Bug] Accumulated bug fix on controller manager, mock app configuration, and gpu optimizer. by zhangjyr in https://github.com/aibrix/aibrix/pull/522
* [Misc] Reduced runtime's container image size by nwangfw in https://github.com/aibrix/aibrix/pull/518
* clean memory scaler object when pa crd is deleted by kr11 in https://github.com/aibrix/aibrix/pull/520
* Configure autoscaler http client to skip certificate check by Jeffwan in https://github.com/aibrix/aibrix/pull/530
* [Doc] Update aibrix documentation by Jeffwan in https://github.com/aibrix/aibrix/pull/533
* Refactor the gateway-plugin and metadata service manifests by Jeffwan in https://github.com/aibrix/aibrix/pull/531
* Fix the GITHUB_WORKSPACE artifact sharing issue in release workflow by Jeffwan in https://github.com/aibrix/aibrix/pull/532
* [Misc] Polish the benchmark scripts by Jeffwan in https://github.com/aibrix/aibrix/pull/525
* Fix APA bugs in creation, add test and demo yaml by kr11 in https://github.com/aibrix/aibrix/pull/536
* Add VKE IPv4 Testing Cluster Config by nwangfw in https://github.com/aibrix/aibrix/pull/537
* Support for request length internal trace by happyandslow in https://github.com/aibrix/aibrix/pull/538
* [Feat] Add download status into runtime downloader by brosoul in https://github.com/aibrix/aibrix/pull/539
* [Feat] Add runtime model management api by brosoul in https://github.com/aibrix/aibrix/pull/540
* [gateway] handle the wrong model name and cache inconsistency case by Jeffwan in https://github.com/aibrix/aibrix/pull/542
* [Docs] fix: update the parameters instruction in readme by scarlet25151 in https://github.com/aibrix/aibrix/pull/548
* add lora schedulers - bin pack, least latency, least throughput, random by Aspirin96 in https://github.com/aibrix/aibrix/pull/544
* add request routers - least kv cache, least expected latency by Aspirin96 in https://github.com/aibrix/aibrix/pull/543
* [Docs] heterogenous gpu docs added by nwangfw in https://github.com/aibrix/aibrix/pull/545
* Fix race condition in cache by varungup90 in https://github.com/aibrix/aibrix/pull/550
* Fix pod internal cache delete handling by varungup90 in https://github.com/aibrix/aibrix/pull/552
* Handle terminating pod for request routing by varungup90 in https://github.com/aibrix/aibrix/pull/549
* Support absolute path as lora adapter artifact path by Jeffwan in https://github.com/aibrix/aibrix/pull/556
* Deadlock fix for cache by varungup90 in https://github.com/aibrix/aibrix/pull/557
* Mock app log fix for missing metrics warning by varungup90 in https://github.com/aibrix/aibrix/pull/564
* Add vllm graceful termination configuration by nwangfw in https://github.com/aibrix/aibrix/pull/568
* Enhance dynamic lora adapter support for auth enabled scenario by Jeffwan in https://github.com/aibrix/aibrix/pull/571
* Update pyproject.toml to support python 3.12 by Jeffwan in https://github.com/aibrix/aibrix/pull/579
* [Docs ]Update ai runtime management api and downloader docs by Jeffwan in https://github.com/aibrix/aibrix/pull/577
* Check the HPA ownerReference in request enqueue by Jeffwan in https://github.com/aibrix/aibrix/pull/582
* Add request length for traces by happyandslow in https://github.com/aibrix/aibrix/pull/569
* Support model registration flow using aibrix runtime api by Jeffwan in https://github.com/aibrix/aibrix/pull/580
* Gateway plugin report total incoming requests and pending requests by zhangjyr in https://github.com/aibrix/aibrix/pull/554
* Support distributed kv cache orchestration by Jeffwan in https://github.com/aibrix/aibrix/pull/583
* Grant workflow action permission to write packages by Jeffwan in https://github.com/aibrix/aibrix/pull/586
* Update routers to use GetPodModelMetric api and misc cleanup in metri… by varungup90 in https://github.com/aibrix/aibrix/pull/590
* Update upload/download artifact github actions version to v4 by varungup90 in https://github.com/aibrix/aibrix/pull/591
* Update version in aibrix/python to 0.2.0-rc.2 by varungup90 in https://github.com/aibrix/aibrix/pull/594
* Update image names in sync-image script by varungup90 in https://github.com/aibrix/aibrix/pull/595
* Update dependency chart for release pipeline by varungup90 in https://github.com/aibrix/aibrix/pull/597
* Patch release for older vllm engine lora support in gateway plugins by varungup90 in https://github.com/aibrix/aibrix/pull/599
* Update component names in staging deployment and readme for new relea… by varungup90 in https://github.com/aibrix/aibrix/pull/605
* Fix the PodAutoscaler kind typo by Jeffwan in https://github.com/aibrix/aibrix/pull/610
* Improve condition update and fix multiple endpoint ips issue by Jeffwan in https://github.com/aibrix/aibrix/pull/609
* Check if model name is present in response from inference engine by varungup90 in https://github.com/aibrix/aibrix/pull/611
* Update log level for few messages in PodAutoscaler by varungup90 in https://github.com/aibrix/aibrix/pull/612
* [enhancement] GPU optimizer accumulated fix by zhangjyr in https://github.com/aibrix/aibrix/pull/598
* Update manifest to use v0.2.0-rc.2 tag by Jeffwan in https://github.com/aibrix/aibrix/pull/614
* Add framework to setup integration test by varungup90 in https://github.com/aibrix/aibrix/pull/616
* [docs] Update lora model adapter docs by Jeffwan in https://github.com/aibrix/aibrix/pull/618
* [docs] Update AI Engine Runtime and Fleet docs by Jeffwan in https://github.com/aibrix/aibrix/pull/619
* [Doc] update quickstart tutorial and add example sending requests via gatew… by nwangfw in https://github.com/aibrix/aibrix/pull/621
* [Doc] feature description for distributed kv cache by DwyaneShi in https://github.com/aibrix/aibrix/pull/623
* WIP: Add docs gateway plugin by varungup90 in https://github.com/aibrix/aibrix/pull/624
* [Docs] Update GPU Optimizer documentation by zhangjyr in https://github.com/aibrix/aibrix/pull/622
* Add integration test to CI workflow by varungup90 in https://github.com/aibrix/aibrix/pull/620
* [Docs] Updated autoscaling doc by gangmuk in https://github.com/aibrix/aibrix/pull/625
* Filter active pods before metrics calculation by Jeffwan in https://github.com/aibrix/aibrix/pull/629
* Fix some issues in the docs and polish contents by Jeffwan in https://github.com/aibrix/aibrix/pull/630
* Ignore Jupyter notebooks for GitHub Linguist by Jeffwan in https://github.com/aibrix/aibrix/pull/632
* [Docs] Improving the heterogenous-GPU feature doc by nwangfw in https://github.com/aibrix/aibrix/pull/634
* [Doc] Fixed autoscaling doc by gangmuk in https://github.com/aibrix/aibrix/pull/635
* Fix out of space error in running integration test github workflow by varungup90 in https://github.com/aibrix/aibrix/pull/628
* Use AIBRIX_POD_METRIC_REFRESH_INTERVAL_MS=50 in base configs by Jeffwan in https://github.com/aibrix/aibrix/pull/640
* Fix the least-kv-cache cache store retrieval by Jeffwan in https://github.com/aibrix/aibrix/pull/639
* Add prefix cache aware routing by varungup90 in https://github.com/aibrix/aibrix/pull/641
* [misc] Polish gateway code with better structure by Jeffwan in https://github.com/aibrix/aibrix/pull/645
* Create AIBrix Single-Node Deployment on Lambda scripts by Jeffwan in https://github.com/aibrix/aibrix/pull/659
* End-to-end benchmark pipeline for autoscalers and routing policies by gangmuk in https://github.com/aibrix/aibrix/pull/650
* Clean up scripts under hack folder by Jeffwan in https://github.com/aibrix/aibrix/pull/660
* Add a research section, update architecture and lambda guidance by Jeffwan in https://github.com/aibrix/aibrix/pull/663
* Leverage literalinclude to keep only one code copy and move autoscaler configs to annotations by Jeffwan in https://github.com/aibrix/aibrix/pull/665
* Updated scripts and fixed issues in benchmark/autoscaling by gangmuk in https://github.com/aibrix/aibrix/pull/662
* Benchmark Generator Refactoring by happyandslow in https://github.com/aibrix/aibrix/pull/655
* Add interface for prefix cache indexer by varungup90 in https://github.com/aibrix/aibrix/pull/657
* Fix missing file to generator refactoring by happyandslow in https://github.com/aibrix/aibrix/pull/670
* [Bug] GPU optimizer bug fix and document fix by zhangjyr in https://github.com/aibrix/aibrix/pull/656
* Change error response to json and improve e2e stability by Jeffwan in https://github.com/aibrix/aibrix/pull/669
* Use response buffer to address stream request issue by Jeffwan in https://github.com/aibrix/aibrix/pull/679
* [docs] Polish feature examples and user guidances by Jeffwan in https://github.com/aibrix/aibrix/pull/686
* Update version and tags to v0.2.0 by Jeffwan in https://github.com/aibrix/aibrix/pull/687
* fix api scheme by kerthcet in https://github.com/aibrix/aibrix/pull/674
* [docs] Polish distributed inference and kv cache examples by Jeffwan in https://github.com/aibrix/aibrix/pull/691
* Improve lora autoscaling and kvcache examples by Jeffwan in https://github.com/aibrix/aibrix/pull/697
* [Docs] Add optimizer-based autoscaling doc and examples by nwangfw in https://github.com/aibrix/aibrix/pull/692
* Add cpu/memory resources for control plane components by varungup90 in https://github.com/aibrix/aibrix/pull/702
* Update log config for sample deployments by varungup90 in https://github.com/aibrix/aibrix/pull/704
* [Docs] Add feature description of dist kv cache in README by DwyaneShi in https://github.com/aibrix/aibrix/pull/705
* [Docs] Update README.md by Jeffwan in https://github.com/aibrix/aibrix/pull/706
* Add feature description for heterogeneous gpu inference feature by nwangfw in https://github.com/aibrix/aibrix/pull/707
* Bump py version to 0.2.0.post1 by Jeffwan in https://github.com/aibrix/aibrix/pull/708
* Fix wrong path for generated html by kerthcet in https://github.com/aibrix/aibrix/pull/709
New Contributors
* scarlet25151 made their first contribution in https://github.com/aibrix/aibrix/pull/548
* Aspirin96 made their first contribution in https://github.com/aibrix/aibrix/pull/544
* DwyaneShi made their first contribution in https://github.com/aibrix/aibrix/pull/623
* gangmuk made their first contribution in https://github.com/aibrix/aibrix/pull/625
* kerthcet made their first contribution in https://github.com/aibrix/aibrix/pull/674
**Full Changelog**: https://github.com/aibrix/aibrix/compare/v0.1.0...v0.2.0