Chatglm-cpp

Latest version: v0.4.2

Safety actively analyzes 723882 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 3

0.2.5

* Optimize context computing (GEMM) for metal backend
* Support repetition penalty option for generation
* Update Dockerfile for CPU & CUDA backends with full functionality, hosted on GHCR

0.2.4

* Python binding enhancement: support load-and-convert directly from original Hugging Face models. Intermediate GGML model files are no longer necessary.
* Small fix for CLI demo on Windows.

0.2.3

* Windows support: enable AVX/AVX2 for better performance, fix stdout encoding issues, and support python binding on Windows.
* API server: support LangChain integration & OpenAI API compatible server.
* New model: Support CodeGeeX2 model inference in native c++ & python binding.

0.2.2

* Support MPS (Metal Performance Shaders) backend on Apple silicon devices for ChatGLM2.
* Support Volta, Turing and Ampere CUDA architectures.

0.2.1

* 3x speedup for CUDA implementation.
* Increase scratch size to accommodate up to 2k context.

0.2.0

First release:
* Accelerated CPU inference for ChatGLM-6B and ChatGLM2-6B for real-time chatting on MacBook.
* Support int4/int5/int8 quantization, KV cache, efficient sampling, parallel computing and streaming generation.
* Python binding, web demo, and more possibilities.

Page 3 of 3

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.