Chatglm-cpp

Latest version: v0.4.2

Safety actively analyzes 722460 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 3

0.4.1

* Support GLM4V, the first vision language model in GLM series
* Fix nan/inf logits by rescheduling attention scaling

0.4.0

* Dynamic memory allocation on demand to fully utilize device memory. No preset scratch size or memory size any more.
* Drop Baichuan/InternLM support since they were integrated in llama.cpp.
* API change:
* CMake CUDA option: `-DGGML_CUBLAS` changed to `-DGGML_CUDA`
* CMake CUDA architecture: `-DCUDA_ARCHITECTURES` changed to `-DCMAKE_CUDA_ARCHITECTURES`
* `num_threads` in `GenerationConfig` was removed: the optimal thread settings will be automatically selected.

0.3.4

* Fix regex negative lookahead for code input tokenization
* Fix OpenAI API server by using `apply_chat_template` to calculate tokens

0.3.3

Support ChatGLM4 conversation mode

0.3.2

* Support p-tuning v2 finetuned models for ChatGLM family
* Fix convert.py for lora models & chatglm3-6b-128k
* Fix RoPE theta config for 32k/128k sequence length
* Better cuda cmake script respecting nvcc version

0.3.1

* Support function calling in OpenAI api server
* Faster repetition penalty sampling
* Support max_new_tokens generation option

Page 1 of 3

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.