Dstack

Latest version: v0.19.1

Safety actively analyzes 723158 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 15

12.1

We've upgraded the default Docker image's CUDA drivers to 12.1 (for better compatibility with modern libraries).

Mixtral 8x7B

Lastly, and most importantly, we've added the [example](https://dstack.ai/examples/mixtral) showing how to deploy Mixtral 8x7B as a service.

3.12

ide: vscode

resources:
gpu: gaudi2:8 8 × Gaudi 2

> [!NOTE]
> To use SSH fleets with Intel Gaudi, ensure that the [Gaudi software and drivers](https://docs.habana.ai/en/latest/Installation_Guide/Driver_Installation.html#driver-installation) are installed on each host. This should include the drivers, `hl-smi`, and Habana Container Runtime.

Volumes

Stop duration and force detachment

In some cases, a volume may get stuck in the `detaching` state. When this happens, the run is marked as stopped, but the instance remains in an inconsistent state, preventing its deletion or reuse. Additionally, the volume cannot be used with other runs.

To address this, `dstack` now ensures that the run remains in the `terminating` state until the volume is fully detached. By default, `dstack` waits for 5m before forcing a detach. You can override this using `stop_duration` by setting a different duration or disabling it (`off`) for an unlimited duration.

> [!NOTE]
> Force detaching a volume may corrupt the file system and should only be used as a last resort. If volumes frequently require force detachment, contact your cloud provider’s support to identify the root cause.

Bug-fixes

This update also resolves an issue where `dstack` mistakenly marked a volume as `attached` even though it was actually detached.

UI

Fleets

The UI has been updated to simplify fleet and instance management. The `Fleets` page now allows users to terminate fleets and displays both active and terminated fleets. The new Instances page shows active and terminated instances across all fleets.

![](https://github.com/user-attachments/assets/56685e73-3784-49e0-996b-01e046e17a3b)

What's changed

* Add Intel Gaudi support for SSH fleets by un-def in https://github.com/dstackai/dstack/pull/2216
* Support models with non-standard `finish_reason` by jvstme in https://github.com/dstackai/dstack/pull/2229
* [Internal]: Ensure all files end with a newline by jvstme in https://github.com/dstackai/dstack/pull/2227
* [chore]: Refactor gateway modules by jvstme in https://github.com/dstackai/dstack/pull/2226
* [chore]: Move connection pool to proxy deps by jvstme in https://github.com/dstackai/dstack/pull/2235
* [chore]: Update migration `ffa99edd1988` by jvstme in https://github.com/dstackai/dstack/pull/2217
* [chore]: Update/remove dstack-proxy TODOs by jvstme in https://github.com/dstackai/dstack/pull/2239
* [UI] New UI for fleets and instances by olgenn in https://github.com/dstackai/dstack/pull/2236
* Improve UX when no offers found by jvstme in https://github.com/dstackai/dstack/pull/2240
* Implement volumes force detach by r4victor in https://github.com/dstackai/dstack/pull/2242

**Full changelog**: https://github.com/dstackai/dstack/compare/0.18.37...0.18.38

3.11

env:
- MODEL=NousResearch/Llama-2-7b-chat-hf
commands:
- pip install vllm
- python -m vllm.entrypoints.openai.api_server --model $MODEL --port 8000
port: 8000

resources:
gpu: 24GB

model:
format: openai
type: chat
name: NousResearch/Llama-2-7b-chat-hf

What's changed

* Configuration resources & ranges by Egor-S in https://github.com/dstackai/dstack/pull/844
* Range.__str__ always returns a string by Egor-S in https://github.com/dstackai/dstack/pull/845
* Add infinity example by deep-diver in https://github.com/dstackai/dstack/pull/847
* error in documentation: use --url instead of --server by promsoft in https://github.com/dstackai/dstack/pull/852
* Support authorization on the gateway by Egor-S in https://github.com/dstackai/dstack/pull/851
* Implement Kubernetes backend by r4victor in https://github.com/dstackai/dstack/pull/853
* Add gpu support for kubernetes by r4victor in https://github.com/dstackai/dstack/pull/856
* Resources parse and store by Egor-S in https://github.com/dstackai/dstack/pull/857
* Use python3.11 in generate-json-schema by r4victor in https://github.com/dstackai/dstack/pull/859
* Implement OpenAI to OpenAI adapter for gateway by Egor-S in https://github.com/dstackai/dstack/pull/860

New contributors

* deep-diver made their first contribution in https://github.com/dstackai/dstack/pull/847
* promsoft made their first contribution in https://github.com/dstackai/dstack/pull/852

**Full Changelog**: https://github.com/dstackai/dstack/compare/0.14.0...0.15.0

3.10

This line ensures `nvcc` is included into the base Docker image
nvcc: true

commands:
- pip install -r requirements.txt
- python train.py

resources:
gpu: 24GB

Environment variables for on-prem fleets

When you create an on-prem fleet, it's now possible to pre-configure environment variables. These variables will be used when installing the `dstack-shim` service on hosts and running workloads.

For example, these environment variables can be used to configure `dstack` to use a proxy:

yaml
type: fleet
name: my-fleet

placement: cluster

env:
- HTTP_PROXY=http://proxy.example.com:80
- HTTPS_PROXY=http://proxy.example.com:80
- NO_PROXY=localhost,127.0.0.1

ssh_config:
user: ubuntu
identity_file: ~/.ssh/id_rsa
hosts:
- 3.255.177.51
- 3.255.177.52

Examples

New examples include:
* [Llama 3.1](https://dstack.ai/docs/examples/llms/llama31/) recipes for inference and fine-tuning
* [Spark](https://github.com/dstackai/dstack/tree/master/examples/misc/spark) cluster setup
* [Ray](https://github.com/dstackai/dstack/tree/master/examples/misc/ray) cluster setup

Other

* [Bugifx] Fix filtering offers by disk size by jvstme in https://github.com/dstackai/dstack/pull/1517
* [Bugifx] Run containers as root for all images by r4victor in https://github.com/dstackai/dstack/pull/1499
* [Docs] Document GCP permissions for volumes by r4victor in https://github.com/dstackai/dstack/pull/1501
* [Docs] Another batch of docs improvements 1497 by peterschmidt85 in https://github.com/dstackai/dstack/pull/1498
* [Bugfix] Fix creating TensorDock instances by jvstme in https://github.com/dstackai/dstack/pull/1506
* [Bugfix] Launch TensorDock instances with correct disk size by jvstme in https://github.com/dstackai/dstack/pull/1508
* [Bugfix] Set timeouts to TensorDock API requests by jvstme in https://github.com/dstackai/dstack/pull/1509
* [Docs] Update TensorDock setup instructions by jvstme in https://github.com/dstackai/dstack/pull/1512
* [Internal] Implement API endpoint for listing volumes across projects by r4victor in https://github.com/dstackai/dstack/pull/1519
* [Internal] Include Volume.deleted in the API by r4victor in https://github.com/dstackai/dstack/pull/1520
* [Docs] Update the Axolotl example 1493 by peterschmidt85 in https://github.com/dstackai/dstack/pull/1494
* [Internal] Print docker image pulling errors to `shim.log` by jvstme in https://github.com/dstackai/dstack/pull/1503
* [Feature] Add `env` setting to fleet config for on-prem fleets by un-def in https://github.com/dstackai/dstack/pull/1505

**Full changelog**: https://github.com/dstackai/dstack/compare/0.18.8...0.18.9

2.0

max_duration: 1d

New examples :fire::fire:

Thanks to the contribution from deep-diver, we got two new examples:

* [Alignment Handbook](https://github.com/dstackai/dstack/blob/master/examples/fine-tuning/alignment-handbook/README.md)
* [Axolotl](https://github.com/dstackai/dstack/blob/master/examples/fine-tuning/axolotl/README.md)

Other

- Configuring VPCs using their IDs (via `vpc_ids` in `server/config.yml`)
- Support for global profiles (via `~/.dstack/profiles.yml`)
- Updated the default environment variables (`DSTACK_RUN_NAME`, `DSTACK_GPUS_NUM`, `DSTACK_NODES_NUM`, `DSTACK_NODE_RANK`, and `DSTACK_MASTER_NODE_IP`)
- It’s now possible to use NVIDIA `A10` GPU on Azure
- More granular permissions for Azure

What's changed

* Fix server freeze on terminate instance by jvstme in https://github.com/dstackai/dstack/pull/1132
* Support profile params in run configurations by r4victor in https://github.com/dstackai/dstack/pull/1131
* Support global `.dstack/profiles.yml` by r4victor in https://github.com/dstackai/dstack/pull/1134
* Fix `No such profile: None` when missing `.dstack/profiles.yml` by r4victor in https://github.com/dstackai/dstack/pull/1135
* Make Azure permissions more granular by r4victor in https://github.com/dstackai/dstack/pull/1139
* Validate min disk size by r4victor in https://github.com/dstackai/dstack/pull/1146
* Fix unexpected error if system Python version is unknown by r4victor in https://github.com/dstackai/dstack/pull/1147
* Add request timeouts to prevent code freezes by jvstme in https://github.com/dstackai/dstack/pull/1140
* Refactor backends to wait for instance IP address outside `run_job/create_instance` by r4victor in https://github.com/dstackai/dstack/pull/1149
* Fix provisioning Azure instances with A10 GPU by jvstme in https://github.com/dstackai/dstack/pull/1150
* [Internal] Move `packer` -> `scripts/packer` by jvstme in https://github.com/dstackai/dstack/pull/1153
* Added the ability of adding own instances by TheBits in https://github.com/dstackai/dstack/pull/1115
* An issue with the `executor_error` check being falsely positive by TheBits in https://github.com/dstackai/dstack/pull/1160
* Make user project quota configurable by r4victor in https://github.com/dstackai/dstack/pull/1161
* Configure CORS headers on gateway by r4victor in https://github.com/dstackai/dstack/pull/1166
* Allow to configure AWS `vpc_ids` by r4victor in https://github.com/dstackai/dstack/pull/1170
* [Internal] Show dstack version in Sentry issues by jvstme in https://github.com/dstackai/dstack/pull/1167
* Fix `KeyError: 'IpPermissions'` when using AWS by jvstme in https://github.com/dstackai/dstack/pull/1169
* Create public ssh key is it not exist in `dstack pool add-ssh` by TheBits in https://github.com/dstackai/dstack/pull/1173
* Fixed is the environment file upload by TheBits in https://github.com/dstackai/dstack/pull/1175
* Updated shim status processing by TheBits in https://github.com/dstackai/dstack/pull/1174
* Fix bugs in `dstack pool add-ssh` by TheBits in https://github.com/dstackai/dstack/pull/1178
* Fix Cudo Create VM response error by Bihan in https://github.com/dstackai/dstack/pull/1179
* Implement API for configuring backends via yaml by r4victor in https://github.com/dstackai/dstack/pull/1181
* Allow running gated models with `HUGGING_FACE_HUB_TOKEN` by r4victor in https://github.com/dstackai/dstack/pull/1184
* Pass all dstack runner envs as `DSTACK_*` by r4victor in https://github.com/dstackai/dstack/pull/1185
* Improve the retries in the get_host_info and get_shim_healthcheck by TheBits in https://github.com/dstackai/dstack/pull/1183
* Example/h4alignment handbook by deep-diver in https://github.com/dstackai/dstack/pull/1180
* The deploy is launched in ThreadPoolExecutor by TheBits in https://github.com/dstackai/dstack/pull/1186

**Full Changelog**: https://github.com/dstackai/dstack/compare/0.18.0...0.18.1rc2

0.19.0

Simplified backend integration

To provide best multi-cloud experience and GPU availability, `dstack` integrates with many cloud GPU providers including AWS, Azure, GCP, RunPod, Lambda, Vultr, and [others](https://dstack.ai/partners/). As we'd like to see even more GPU providers supported by `dstack`, this release comes with a major internal refactoring aimed to simplify the process of adding new integrations. See the [Backend integration](https://github.com/dstackai/dstack/blob/master/contributing/BACKENDS.md) guide for more details. Join [our Discord](https://discord.com/invite/u8SmfwPpMd) if have any questions about the integration process.

MPI workloads and NCCL tests

`dstack` now configures internode SSH connectivity for [distributed tasks](https://dstack.ai/docs/concepts/tasks/#distributed-tasks). You can log in to any node from any node via SSH with a simple `ssh <node_ip>` command. The out-of-the-box SSH connectivity also allows running `mpirun`. See the [NCCL Tests](https://dstack.ai/examples/misc/nccl-tests/) example.

Cost and usage metrics

In addition to DCGM metrics, `dstack` now exports a set of Prometheus metrics for cost and usage tracking. Here's how it may look in the Grafana dashboard:

![image](https://github.com/user-attachments/assets/22245567-71a3-4913-8b08-af9016135157)

See the [documentation](https://dstack.ai/docs/guides/metrics/) for a full list of metrics and labels.

Cursor IDE support

`dstack` can now launch [Cursor](https://www.cursor.com/) dev environments. Just specify `ide: cursor` in the run configuration:

yaml
type: dev-environment
ide: cursor

Deprecations

* The Python API methods `get_plan()`, `exec_plan()`, and `submit()` are deprecated in favor of `get_run_plan()`, `apply_plan()`, and `apply_configuration()`. The deprecated methods had clumsy signatures with many top-level parameters. The new signatures align better with the CLI and HTTP API.

Breaking changes

The 0.19.0 release drops several previously deprecated or undocumented features. There are no other significant breaking changes. The 0.19.0 server continues to support 0.18.x CLI versions. But the 0.19.0 CLI does not work with older 0.18.x servers, so you should update the server first or the server and the clients simultaneously.

* Drop the `dstack run` CLI command.
* Drop the `--attach` mode for the `dstack logs` CLI command.
* Drop Pools functionality:
* The `dstack pool` CLI commands.
* `/api/project/{project_name}/runs/get_offers`, `/api/project/{project_name}/runs/create_instance`, `/api/pools/list_instances`, `/api/project/{project_name}/pool/*` API endpoints.
* `pool_name` and `instance_name` parameters in profiles and run configurations.
* Remove `retry_policy` from profiles.
* Remove `termination_idle_time` and `termination_policy` from profiles and fleet configurations.
* Drop `RUN_NAME` and `REPO_ID` run environment variables.
* Drop the `/api/backends/config_values` endpoint used for interactive configuration.
* The API accepts and returns `azure_config["regions"]` instead of `azure_config["locations"]` (unified with `server/config.yml`).

What's Changed
* Fix gateways with a previously used IP address by jvstme in https://github.com/dstackai/dstack/pull/2388
* Simplify backend configurators and models by r4victor in https://github.com/dstackai/dstack/pull/2389
* Store BackendType as string instead of enum in the DB by r4victor in https://github.com/dstackai/dstack/pull/2393
* Introduce ComputeWith classes to detect compute features by r4victor in https://github.com/dstackai/dstack/pull/2392
* Move backend/compute configs from config.py to models.py by r4victor in https://github.com/dstackai/dstack/pull/2395
* Provide default run_job implementation for VM backends by r4victor in https://github.com/dstackai/dstack/pull/2396
* Configure inter-node SSH on multi-node tasks by un-def in https://github.com/dstackai/dstack/pull/2394
* [Blog] Using SSH fleets with TensorWave's private AMD cloud by peterschmidt85 in https://github.com/dstackai/dstack/pull/2391
* Add script to generate boilerplate code for new backend by r4victor in https://github.com/dstackai/dstack/pull/2397
* Add `datacenter-gpu-manager-4-proprietary` to CUDA images by un-def in https://github.com/dstackai/dstack/pull/2399
* Drop pools by r4victor in https://github.com/dstackai/dstack/pull/2401
* Transition high-level Python runs API to new methods by r4victor in https://github.com/dstackai/dstack/pull/2403
* Drop dstack run by r4victor in https://github.com/dstackai/dstack/pull/2404
* Drop dstack logs --attach by r4victor in https://github.com/dstackai/dstack/pull/2405
* Remove retry_policy from profiles by r4victor in https://github.com/dstackai/dstack/pull/2406
* Remove termination_idle_time and termination_policy by r4victor in https://github.com/dstackai/dstack/pull/2407
* Clean up models backward compatibility code by r4victor in https://github.com/dstackai/dstack/pull/2408
* Restore removed models fields for compatibility with 0.18 clients by r4victor in https://github.com/dstackai/dstack/pull/2414
* Clean up legacy repo fields by jvstme in https://github.com/dstackai/dstack/pull/2411
* Switch AWS gateways from t2.micro to t3.micro by r4victor in https://github.com/dstackai/dstack/pull/2416
* Remove old client excludes by r4victor in https://github.com/dstackai/dstack/pull/2417
* Use new JobTerminationReason values by r4victor in https://github.com/dstackai/dstack/pull/2418
* Drop RUN_NAME and REPO_ID env vars by r4victor in https://github.com/dstackai/dstack/pull/2419
* Drop irrelevant Nebius backend implementation by jvstme in https://github.com/dstackai/dstack/pull/2421
* [Feature]: Support the cursor IDE 2412 by peterschmidt85 in https://github.com/dstackai/dstack/pull/2413
* Simplify implementation of new backends 2372 by olgenn in https://github.com/dstackai/dstack/pull/2423
* Support multiple domains with Entra login by r4victor in https://github.com/dstackai/dstack/pull/2424
* Support setting project members by email by r4victor in https://github.com/dstackai/dstack/pull/2429
* Fix json schema reference and invalid properties errors by r4victor in https://github.com/dstackai/dstack/pull/2433
* [Blog]: DeepSeek R1 inference performance: MI300X vs. H200 by peterschmidt85 in https://github.com/dstackai/dstack/pull/2425
* Add new metrics by un-def in https://github.com/dstackai/dstack/pull/2434
* Add instance and job cost/usage Prometheus metrics by un-def in https://github.com/dstackai/dstack/pull/2432
* [Docker] Add dstackai/efa image by un-def in https://github.com/dstackai/dstack/pull/2422
* Restore fleet termination_policy for 0.18 backward compatibility by r4victor in https://github.com/dstackai/dstack/pull/2436
* [Bug]: Search over users doesn't work by olgenn in https://github.com/dstackai/dstack/pull/2439
* [Feature]: Support activating/deactivating users via the UI by olgenn in https://github.com/dstackai/dstack/pull/2440
* [Feature]: Display Assigned Gateway Information on Run Pages by olgenn in https://github.com/dstackai/dstack/pull/2438
* [Docs]: Update the `Metrics` guide by peterschmidt85 in https://github.com/dstackai/dstack/pull/2441
* [Examples] Update nccl-tests by un-def in https://github.com/dstackai/dstack/pull/2415

**Full Changelog**: https://github.com/dstackai/dstack/compare/0.18.44...0.19.0

Page 1 of 15

Releases

Has known vulnerabilities

Dstack

Page 1 of 15

12.1

3.12

3.11

3.10

2.0

0.19.0

Page 1 of 15

Links

Releases