Latest version: v0.0.1
The information on this page was curated by experts in our Cybersecurity Intelligence Team.
To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
No known vulnerabilities found