Ollama

Latest version: v0.4.7

Safety actively analyzes 723929 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 7 of 19

0.2.4

What's Changed
* Fixed issue where `context`, `load_duration` and `total_duration` fields would not be set in the `/api/generate` endpoint.
* Ollama will no longer error if loading models larger than system memory if disk space is available

New Contributors
* Jaarson made their first contribution in https://github.com/ollama/ollama/pull/5675

**Full Changelog**: https://github.com/ollama/ollama/compare/v0.2.3...v0.2.4

0.2.3

What's Changed
* Fix issue where system prompt would not be applied


**Full Changelog**: https://github.com/ollama/ollama/compare/v0.2.2...v0.2.3

0.2.2

What's Changed
* Fixed errors that occurred when using Ollama with Nvidia V100 GPUs
* `glm4` models will no longer fail to load from out of memory errors
* Fixed error that would occur when running `deepseek-v2` and `deepseek-coder-v2` models
* Fixed a series of out of memory issues when using Nvidia GPUs
* Fixed a series of errors that would occur when using multiple Radeon GPUs
* Fixed rare missing DLL issue on Windows
* Fixed rare crash if ROCm was manually installed on Windows

**Full Changelog**: https://github.com/ollama/ollama/compare/v0.2.1...v0.2.2

0.2.1

What's Changed
* Fixed issue where setting `OLLAMA_NUM_PARALLEL` would cause models to be reloaded after each request

**Full Changelog**: https://github.com/ollama/ollama/compare/v0.2.0...v0.2.1

0.2.0

Parallel requests

Ollama can now serve multiple requests at the same time, using only a little bit of additional memory for each request. This enables use cases such as:

- Handling multiple chat sessions at the same time
- Hosting a code completion LLM for your internal team
- Processing different parts of a document simultaneously
- Running several agents at the same time.

https://github.com/ollama/ollama/assets/251292/9772a5f1-c072-41db-be6c-dd3c621aa2fd

Multiple models

Ollama now supports loading different models at the same time, dramatically improving:

- Retrieval Augmented Generation (RAG): both the embedding and text completion models can be loaded into memory simultaneously.
- Agents: multiple different agents can now run simultaneously
- Running large and small models side-by-side

Models are automatically loaded and unloaded based on requests and how much GPU memory is available.

To see which models are loaded, run `ollama ps`:


% ollama ps
NAME ID SIZE PROCESSOR UNTIL
gemma:2b 030ee63283b5 2.8 GB 100% GPU 4 minutes from now
all-minilm:latest 1b226e2802db 530 MB 100% GPU 4 minutes from now
llama3:latest 365c0bd3c000 6.7 GB 100% GPU 4 minutes from now


For more information on concurrency, see the [FAQ](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-does-ollama-handle-concurrent-requests)

New models
* [GLM-4](https://ollama.com/library/glm4): A strong multi-lingual general language model with competitive performance to Llama 3.
* [CodeGeeX4](https://ollama.com/library/codegeex4): A versatile model for AI software development scenarios, including code completion.
* [Gemma 2](https://ollama.com/library/gemma2): Improved output quality and base text generation models now available

What's Changed
* Improved Gemma 2
* Fixed issue where model would generate invalid tokens after hitting context window
* Fixed inference output issues with `gemma2:27b`
* Re-downloading the model may be required: `ollama pull gemma2` or `ollama pull gemma2:27b`
* Ollama will now show a better error if a model architecture isn't supported
* Improved handling of quotes and spaces in Modelfile `FROM` lines
* Ollama will now return an error if the system does not have enough memory to run a model on Linux

New Contributors
* Muku784 made their first contribution in https://github.com/ollama/ollama/pull/5382
* abitrolly made their first contribution in https://github.com/ollama/ollama/pull/4821

**Full Changelog**: https://github.com/ollama/ollama/compare/v0.1.48...v0.2.0

0.1.48

![gemma 2](https://github.com/ollama/ollama/assets/3325447/7fd14968-be5a-4b5b-9daa-5d00722414f1)

What's Changed
* Fixed issue where Gemma 2 would continuously output when reaching context limits
* Fixed out of memory and core dump errors when running Gemma 2
* `/show info` will now show additional model information in `ollama run`
* Fixed issue where `ollama show` would result in an error on certain vision models


**Full Changelog**: https://github.com/ollama/ollama/compare/v0.1.47...v0.1.48

Page 7 of 19

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.