Ollama

Latest version: v0.4.1

Safety actively analyzes 683322 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 10 of 17

0.1.24

OpenAI Compatibility

![openai](https://github.com/ollama/ollama/assets/251292/da36abcd-c929-4806-b957-5adf41ac641a)


This release adds initial compatibility support for the OpenAI [Chat Completions API](https://platform.openai.com/docs/api-reference/chat).

* [Documentation](https://github.com/ollama/ollama/blob/main/docs/openai.md)
* [Examples](https://ollama.ai/blog/openai-compatibility)

Usage with cURL


curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama2",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello!"
}
]
}'


New Models
* [Qwen 1.5](https://ollama.ai/library/qwen): Qwen 1.5 is a new family of large language models by Alibaba Cloud spanning from 0.5B to 72B.

What's Changed
* Fixed issue where requests to `/api/chat` would hang when providing empty `user` messages repeatedly
* Fixed issue on macOS where Ollama would return a missing library error after being open for a long period of time

New Contributors
* easp made their first contribution in https://github.com/ollama/ollama/pull/2340
* mraiser made their first contribution in https://github.com/ollama/ollama/pull/1849

**Full Changelog**: https://github.com/ollama/ollama/compare/v0.1.23...v0.1.24

0.1.23

![vision](https://github.com/ollama/ollama/assets/251292/cb513ae7-a9c2-4a2b-891b-fd79bf997f86)

New vision models

The [LLaVA](https://ollama.ai/library/llava) model family on Ollama has been updated to [version 1.6](https://llava-vl.github.io/blog/2024-01-30-llava-1-6/), and now includes a new `34b` version:
- `ollama run llava` A new 7B LLaVA model based on mistral.
- `ollama run llava:13b` 13B LLaVA model
- `ollama run llava:34b` 34B LLaVA model – one of the most powerful open-source vision models available

These new models share new improvements:

- **More permissive licenses:** LLaVA 1.6 models are distributed via the Apache 2.0 license or the LLaMA 2 Community License.
- **Higher image resolution:** support for up to 4x more pixels, allowing the model to grasp more details.
- **Improved text recognition and reasoning capabilities:** these models are trained on additional document, chart and diagram data sets.

`keep_alive` parameter: control how long models stay loaded

When making API requests, the new `keep_alive` parameter can be used to control how long a model stays loaded in memory:

shell
curl http://localhost:11434/api/generate -d '{
"model": "mistral",
"prompt": "Why is the sky blue?",
"keep_alive": "30s"
}'


* If set to a positive duration (e.g. `20m`, `1hr` or `30`), the model will stay loaded for the provided duration
* If set to a negative duration (e.g. `-1`), the model will stay loaded indefinitely
* If set to `0`, the model will be unloaded immediately once finished
* If not set, the model will stay loaded for 5 minutes by default

Support for more Nvidia GPUs

* GeForce GTX `TITAN X` `980 Ti` `980` `970` `960` `950` `750 Ti` `750`
* GeForce GTX `980M` `970M` `965M` `960M` `950M` `860M` `850M`
* GeForce `940M` `930M` `910M` `840M` `830M`
* Quadro `M6000` `M5500M` `M5000` `M2200` `M1200` `M620` `M520`
* Tesla `M60` `M40`
* NVS `810`

What's Changed
* New `keep_alive` API parameter to control how long models stay loaded
* Image paths can now be provided to `ollama run` when running multimodal models
* Fixed issue where downloading models via `ollama pull` would slow down to 99%
* Fixed error when running Ollama with Nvidia GPUs and CPUs without AVX instructions
* Support for additional Nvidia GPUs (compute capability 5)
* Fixed issue where system prompt would be repeated in subsequent messages
* `ollama serve` will now print prompt when `OLLAMA_DEBUG=1` is set
* Fixed issue where exceeding context size would cause erroneous responses in `ollama run` and the `/api/chat` API
* `ollama run` will now allow sending messages without images to multimodal models

New Contributors
* jaglinux made their first contribution in https://github.com/ollama/ollama/pull/2224
* textspur made their first contribution in https://github.com/ollama/ollama/pull/2252
* rjmacarthy made their first contribution in https://github.com/ollama/ollama/pull/1950
* hugo53 made their first contribution in https://github.com/ollama/ollama/pull/1957
* RussellCanfield made their first contribution in https://github.com/ollama/ollama/pull/2313

**Full Changelog**: https://github.com/ollama/ollama/compare/v0.1.22...v0.1.23

0.1.22

New models
* [Stable LM 2](https://ollama.ai/library/stablelm2): A state-of-the-art 1.6B small language model.

What's Changed
* Fixed issue with Nvidia GPU detection that would cause Ollama to error instead of falling back to CPU
* Fixed issue where AMD integrated GPUs caused an error

**Full Changelog**: https://github.com/ollama/ollama/compare/v0.1.21...v0.1.22

0.1.21

New models
* [Qwen](https://ollama.ai/library/qwen): Qwen is a series of large language models by Alibaba Cloud spanning from 1.8B to 72B parameters.
* [DuckDB-NSQL](https://ollama.ai/library/duckdb-nsql): A text-to-sql LLM for DuckDB
* [Stable Code](https://ollama.ai/library/stable-code): A new code completion model on par with Code Llama 7B and similar models.
* [Nous Hermes 2 Mixtral](https://ollama.ai/library/nous-hermes2-mixtral): The Nous Hermes 2 model from Nous Research, now trained over Mixtral.

Saving and loading models and messages

Models can now be saved and loaded with `/save <model>` and `/load <model>` when using `ollama run`. This will save or load conversations and any model changes with `/set parameter`, `/set system` and more as a new model with the provided name.

`MESSAGE` modelfile command

Messages can now be specified in a `Modelfile` ahead of time using the `MESSAGE` command:


example Modelfile
FROM llama2
SYSTEM You are a friendly assistant that only answers with 'yes' or 'no'
MESSAGE user Is Toronto in Canada?
MESSAGE assistant yes
MESSAGE user Is Sacramento in Canada?
MESSAGE assistant no
MESSAGE user Is Ontario in Canada?
MESSAGE assistant yes


After creating this model, running it will restore the message history. This is useful for techniques such as [Chain-Of-Thought](https://www.promptingguide.ai/techniques/cot) prompting


ollama create -f Modelfile yesno
ollama run yesno
>>> Is Toronto in Canada?
yes

>>> Is Sacramento in Canada?
no

>>> Is Ontario in Canada?
yes

>>> Is Havana in Canada?
no


Python and Javascript libraries

The first versions of the [Python](https://github.com/ollama/ollama-python) and [JavaScript](https://github.com/ollama/ollama-js) libraries for Ollama are [now available](https://ollama.ai/blog/python-javascript-libraries).

Intel & AMD CPU improvements

Ollama now supports CPUs without AVX. This means Ollama will now run on older CPUs and in environments (such as virtual machines, Rosetta, GitHub actions) that don't provide support for AVX instructions. For newer CPUs that support AVX2, Ollama will receive a small performance boost, running models about 10% faster.

What's Changed
* Support for a much broader set of CPUs, including CPUs without AVX instruction set support
* If a GPU detection error is hit when attempting to run a model, Ollama will fallback to CPU
* Fixed issue where generating responses with the same prompt would hang after around 20 requests
* New `MESSAGE` Modelfile command to set the conversation history when building a model
* Ollama will now use AVX2 for faster performance if available
* Improved detection of Nvidia GPUs, especially in WSL
* Fixed issue where models with LoRA layers may not load
* Fixed incorrect error that would occur when retrying network connections in `ollama pull` and `ollama push`
* Fixed issue where `/show parameter` would round decimal numbers
* Fixed issue where upon hitting the context window limit, requests would hang

New Contributors
* fpreiss made their first contribution in https://github.com/jmorganca/ollama/pull/1921
* eavanvalkenburg made their first contribution in https://github.com/jmorganca/ollama/pull/1931
* 0atman made their first contribution in https://github.com/jmorganca/ollama/pull/1924
* sachinsachdeva made their first contribution in https://github.com/jmorganca/ollama/pull/2021
* Arrendy made their first contribution in https://github.com/jmorganca/ollama/pull/2016
* purificant made their first contribution in https://github.com/jmorganca/ollama/pull/1958
* lainedfles made their first contribution in https://github.com/jmorganca/ollama/pull/1999

**Full Changelog**: https://github.com/jmorganca/ollama/compare/v0.1.20...v0.1.21

0.1.20

New models
* [MegaDolphin](https://ollama.ai/library/megadolphin): A new 120B version of the Dolphin model.
* [OpenChat](https://ollama.ai/library/openchat): Updated to the latest version `3.5-0106`.
* [Dolphin Mistral](https://ollama.ai/library/dolphin-mistral): Updated to the latest DPO Laser version, which achieves higher scores with more robust outputs.

What's Changed
* Fixed additional cases where Ollama would fail with `out of memory` CUDA errors
* Multi-GPU machines will now correctly allocate memory across all GPUs
* Fixed issue where Nvidia GPUs would not be detected by Ollama

**Full Changelog**: https://github.com/jmorganca/ollama/compare/v0.1.19...v0.1.20

0.1.19

This release focuses on performance and fixing a number issues and crashes relating to memory allocation.

New Models
* [LLaMa-Pro](https://ollama.ai/library/llama-pro): An expansion of LLaMa by Tencent to an 8B that specializes in language, programming and mathematics.

What's Changed
* Fixed "out of memory" errors when running models such as `llama2`, `mixtral` or `llama2:13b` with limited GPU memory
* Fixed CUDA errors when running on older GPUs that aren't yet supported
* Increasing context size with `num_ctx` will now work (up to a model's supported context window).

To use a 32K context window with Mistral:

ollama run
/set parameter num_ctx 32678

api
curl http://localhost:11434/api/generate -d '{
"model": "mistral",
"prompt": "Why is the sky blue?",
"options": {"num_ctx": 32678}
}'


* Larger models such as `mixtral` can now be run on Macs with less memory
* Fixed an issue where pressing up or down arrow keys would cause the wrong prompt to show in `ollama run`
* Fixed performance issues on Intel Macs
* Fixed an error that would occur with old Nvidia GPUs
* `OLLAMA_ORIGINS` now supports browser extension URLs
* Ollama will now offload more processing to the GPU where possible

New Contributors
* sublimator made their first contribution in https://github.com/jmorganca/ollama/pull/1797
* gbaptista made their first contribution in https://github.com/jmorganca/ollama/pull/1830

**Full Changelog**: https://github.com/jmorganca/ollama/compare/v0.1.18...v0.1.19

Page 10 of 17

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.