Skypilot

Latest version: v0.8.0

Safety actively analyzes 723625 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 4

0.4.1

Not secure
This is a patch release to ship bug fixes faster to our users! This release includes many feature updates and bug fixes, including the new provisioner for AWS, fixing OOM and credential issues for long-running spot jobs, and some additional improvements.

0.4

0.4.0

Not secure
We are excited to release SkyPilot v0.4.0, which brings a host of new features and improvements, including Kubernetes support, native container support, ability to open ports, and more.

Release Highlights

New Features
* **[Kubernetes support](https://skypilot.readthedocs.io/en/v0.4.0/reference/kubernetes/index.html)**: SkyPilot tasks and clusters can now run on Kubernetes clusters, including on-prem and cloud hosted deployments (GKE, EKS).
* If you have a working kubeconfig, simply run `sky check` and `sky launch --cloud kubernetes` to run your task on Kubernetes.
* If desired, tasks can also failover to the cloud when the Kubernetes cluster does not have enough resources. The same SkyPilot YAMLs and CLI works seamlessly across Kubernetes and clouds.
* **[Opening ports on clusters](https://skypilot.readthedocs.io/en/v0.4.0/examples/ports.html)**: Open ports on your clusters with the `ports` field. These ports are publicly accessible and can be used for hosting LLM inference endpoints, Jupyter notebooks, web servers, Tensorboard, and other services.
* **[Native container support](https://skypilot.readthedocs.io/en/v0.4.0/examples/docker-containers.html#using-docker-containers-as-runtime-environment)**: If your task uses docker containers, SkyPilot's `setup` and `run` commands can now directly be executed in that container. This allows you to wrap your environment in a container and run it on any cloud with SkyPilot.
* **[Reservation support](https://skypilot.readthedocs.io/en/v0.4.0/reference/config.html)**: This release adds support for [GCP reservations](https://cloud.google.com/compute/docs/instances/reservations-overview). SkyPilot will now prioritize using your reservations on the cloud to save costs and get higher availability.
* **New Managed Spot Features**
* **[Spot pipeline support](https://github.com/skypilot-org/skypilot/blob/releases/0.4.0/examples/spot_pipeline/multi_jobs.yaml)**: automatically execute a pipeline of sequential tasks.
* **[Spot dashboard](https://skypilot.readthedocs.io/en/v0.4.0/examples/spot-jobs.html#dashboard)**: track all your spot jobs in your browser.

New LLM Recipes
* **vLLM** on any cloud - [blog](https://blog.skypilot.co/serving-llm-24x-faster-on-the-cloud-with-vllm-and-skypilot/), [example](https://github.com/skypilot-org/skypilot/tree/master/llm/vllm).
* **Llama 2** - Train Vicuna on Llama-2 and serve chatbots - [blog](https://blog.skypilot.co/finetuning-llama2-operational-guide/), [fine-tuning example](https://github.com/skypilot-org/skypilot/tree/releases/0.4.0/llm/vicuna-llama-2), [self-hosted chatbot](https://github.com/skypilot-org/skypilot/tree/releases/0.4.0/llm/llama-2).
* **LocalGPT**: chat with your pdfs - [example](https://github.com/skypilot-org/skypilot/tree/releases/0.4.0/llm/localgpt).
* **Falcon-40B** fine-tuning guide - [example](https://github.com/skypilot-org/skypilot/tree/releases/0.4.0/llm/falcon).

More Clouds
SkyPilot now supports 8 clouds, including community contributed support for two new clouds:
* [Oracle Cloud Infrastructure (OCI)](https://skypilot.readthedocs.io/en/v0.4.0/getting-started/installation.html#oracle-cloud-infrastructure-oci)
* [Samsung Cloud Platform (SCP)](https://skypilot.readthedocs.io/en/v0.4.0/getting-started/installation.html#samsung-cloud-platform-scp)

SkyPilot now also supports IBM COS buckets (1966).

Core and UX Improvements
* **Faster failover**: 30x faster failover with our new quota optimization which checks if quotas are available before launching a cluster (Supported on GCP, AWS).
* **Easily get VM IPs**: The new `--ip` flag for `sky status` returns the public IP address of the cluster (e.g., `sky status --ip mycluster`). Use this to access services such as LLM inference endpoints, jupyter notebooks and more.
* **Improved scriptability**: SkyPilot YAMLs and CLI are more scriptable than ever - `file_mounts` can be dynamically defined with environment variables ([docs](https://skypilot.readthedocs.io/en/v0.4.0/running-jobs/environment-variables.html#using-in-file-mounts), [example](https://github.com/skypilot-org/skypilot/blob/releases/0.4.0/examples/using_file_mounts_with_env_vars.yaml)), environment variables can be set through a dotenv file with the new `--env-file` flag (#2296).
* **Core optimizations**: Multi-node clusters stop 4x faster (2199), `sky status` updates for stopped clusters are 10x faster (2288), and the job queue is more memory efficient (1636).
* **Nightly releases**: We now release nightly versions of SkyPilot. To get the cutting edge of SkyPilot without installing from source, run `pip install skypilot-nightly` (1446)

Deprecation
* [SkyPilot On-prem](https://skypilot.readthedocs.io/en/v0.3.3/reference/local/index.html) is now deprecated and Kubernetes will be the recommended mode of running SkyPilot on on-prem clusters.


Below is a detailed list of changes.

Managed Spot

New Features
* Spot pipeline support: automatically handles a pipeline of spot jobs. (1982)
* Spot dashboard is now available with `sky spot dashboard`: you can now see all your spot jobs in GUI (2103, 2136)
* Spot callback - users can now run custom code when spot job status changes (2106, 2364)
* Resource configuration of the spot controller can now be customized ([docs](https://skypilot.readthedocs.io/en/v0.4.0/reference/config.html), #2040)

Enhancements
* SkyPilot now shows the spot job's resources and estimated cost before confirmation (2524)
* Switch to eager failover recovery policy for better spot lifetime (2234)
* Reduce the logging for launching spot controller (2056)

Fixes
* We now show PENDING spot job in the spot queue before it starts (2044)
* Robustness fixes (2102, 2153, 2119, 2004, 2330, 1998)


CLI & YAML interfaces

New Features
* Users can now use environment variables to dynamically define file_mounts ([docs](https://skypilot.readthedocs.io/en/v0.4.0/running-jobs/environment-variables.html#using-in-file-mounts), 2146)
* `sky status` can now show the head IP of the cluster with `-a` or `--ip` flags (2305, 2563)
* `sky down/stop/start` defaults to a unique cluster if it exists and `sky cancel` without cluster cancels the latest task (2325)

Enhancement
* `sky check` output is now friendlier with more hints for disabled clouds (2002, 2017, 2196, 2114, 2221, 2377)
* `sky down` progress bar now reflects clusters failed to terminate (1595, 2005)
* We now fail early if rsync is not installed locally (2168)
* Better messages and hints for CLI (2027, 2028, 2077, 2083, 2085)

Fixes
* Fixed the order of VMs in optimizer table when `--cpus` is provided (2037)
* Better handling when `sky launch` is interrupted (2206, 2252)

Backend

New Features
* Users can now open ports for their clusters with the `ports` field ([docs](https://skypilot.readthedocs.io/en/v0.4.0/examples/ports.html), #2210, 2477)
* Docker support in `image_id` - tasks can now be run inside docker containers ([docs](https://skypilot.readthedocs.io/en/v0.4.0/examples/docker-containers.html#using-docker-containers-as-runtime-environment), 1910)
* Users can now clone a cluster from an existing cluster's disk with the `--clone-disk-from` flag (2098)
* Users can now launch their own ray cluster on a SkyPilot cluster (2020)

Enhancements
* 30x faster failover for AWS and GCP when quotas are not available (1953, 2187, 2313)
* Faster `sky launch` by caching cluster IP address (2400)
* Job queue is now more resource efficient, with significant memory consumption reduction on remote cluster (1636)
* Cluster names no longer map directly to cloud cluster names. Instead, they are mapped to a unique cluster name on the cloud. This helps with isolation across users sharing cloud accounts. (2403)
* More efficient and robust stopping/termination for AWS (2121)
* `sky status --refresh` for STOPPED cluster is 10x faster (2079)
* Empty YAML fields are now allowed (1890)

Fixes
* Manually started/stopped clusters are now better handled (2130, 2203, 2389)
* Fix edge case where existing clusters were terminated when resources are not available (2170)
* Fixes for disk_tier UX (2156, 2215)
* Robustness fixes (2033, 2061, 2009, 2491, 2290, 1259, 2074, 2023, 2042)


Storage

New Features
* IBM COS is now supported (1966)
* `sky spot launch` will now exclude files from .gitignore (2018)

Enhancements
* Deletion is now parallelized for faster deletion (2058)
* UX improvements for `sky storage` CLI (2063, 2177)
* GCS bucket mounting now uses gcsfuse v1.0.1 (2470)

Fixes
* Fix transient failures when uploading to GCS from MacOS due to multiprocessing bug (2125)
* Robustness fixes (2049, 2117, 2165, 2259, 2326, 2250)


Dependencies

* Avoid buggy grpcio versions (2055)
* Pydantic is pinned to `<2.0` (2157)
* PyYAML is pinned to `>3.13, != 5.4.*` to avoid issues with Cython 3 (2256, 2514)
* Ray `<= 2.6.3` is supported on local machines (2401)
* `pycryptodome`, `oauth2client` are no longer required (2515)


Clouds

AWS
* H100 GPUs are now supported (2323)
* New [docs](https://skypilot.readthedocs.io/en/v0.4.0/cloud-setup/cloud-auth.html) for AWS cloud administrator about advanced login option (SSO and account switching) (#1888)
* Insufficient permission is now handled gracefully (2415, 2456)
* Fixed a bug where existing AWS cluster would end up in INIT state after changing identity (2442)
* Fix fetching AZ when describe zones permission does not exist in all regions (2463)


GCP
* Nvidia L4 GPUs are now supported (2212)
* [Machine Images](https://cloud.google.com/compute/docs/machine-images) are now supported (#2280)
* GCP reservations are now supported (2352)
* SkyPilot optimizer is 4x faster for GCP instances (2410)
* GCP pricing is now dynamically fetched and is more robust (2118, 2076, 2131)
* Default image has been updated to Debian 11 (2279)
* New [docs](https://skypilot.readthedocs.io/en/v0.4.0/cloud-setup/cloud-permissions/gcp.html) for minimal permission required by GCP account to use SkyPilot for administrator (#2100, 2112)
* Robustness fixes (2135, 2199, 2124, 1879, 2116)
* TPU support is now more robust (2310, 2471, 2350, 2540)

Azure

* westus3 region is now supported (2149)
* Fix status refresh for Azure (2120)
* Fix Azure disk tier interruption for optimize progress (2111)
* Azure catalog fetching is more robust (2115, 2553)


Lambda
* Add H100 support for Lambda Cloud (2010, 2323)
* API rate limit is now handled with backoff and retry (2265)
* Errors are now more detailed (2371)

Oracle Cloud Infrastructure (OCI)
* OCI is now supported (1909, 2047, 2057, 2068, 2034, 2070, 2069, 2062,2067, 2092, 2095, 2099)

Samsung Cloud Platform (SCP)
* Samsung Cloud Platform (SCP) is now supported for single-node clusters (1941, 2001, 2014)

Examples
* New DeepSpeed [example](https://github.com/skypilot-org/skypilot/blob/releases/0.4.0/examples/deepspeed-multinode/sky.yaml) (#2208)
* New Distributed Tensorflow [example](https://github.com/skypilot-org/skypilot/blob/releases/0.4.0/examples/tensorflow_distributed/tf_distributed.yaml) (#1721)
* New DVC [example](https://github.com/skypilot-org/skypilot/blob/releases/0.4.0/examples/dvc/dvc_pipeline.yaml) (#2444)
* Examples dependencies are now up to date (2145, 2223, 2359)

[Full changelog](https://github.com/skypilot-org/skypilot/compare/v0.3.0...v0.4.0)

Thanks to all contributors!
New contributors: JGoo1, tobi, HysunHe, blucz, shethhriday29, MaoZiming, ksasi, pushmatrix, hzeng-0, saihtaungkham, fozziethebeat, n10dollar, asaiacai, mtaku3, gbmarc1, alex000kim, steve-marmalade, xzrderek, sunny0826.

Many thanks to all contributors who contributed to this release!

Michaelvll, concretevitamin, romilbhardwaj, cblmemo, HysunHe, landscapepainter, shethhriday29, infwinston, alex000kim, suquark, sunny0826, gbmarc1, MaoZiming, xzrderek, tobi, steve-marmalade, saihtaungkham, pushmatrix, n10dollar, mtaku3, ksasi, hzeng-0, fozziethebeat, blucz, asiaacai, WoosukKwon, JGoo1, mraheja, iojw, hemildesai, ewzeng, aviweit, Saikrishna-Achalla, Cohen-J-Omer

0.3.3

Not secure
This patch release brings many bug fixes and features, including new mechanics for stop/down, callbacks for spot jobs and a critical dependency fix for PyYAML after the release of cython 3.

0.3.2

Not secure
This is a patch release to ship bug fixes faster to our users! This release includes many feature updates and bug fixes, including the pedantic dependency issue, disk cloning, file mounts, and cloud-specific improvements.

0.3.1

Not secure
This is a patch release to ship **several important enhancements and bug fixes**:

Enhancements
- **On-demand H100 GPU from Lambda is supported!** `sky launch --gpus h100`
- To use it, remove any previous Lambda catalog: `rm -rf ~/.sky/catalogs/v5/lambda`
- Managed spot: make job cancellation during failover more robust to mitigate a rare `FAILED_SETUP` error (1998)

Fixes
- Provisioner / Backend
- Fix provision failover encountering FileNotFoundError (2005)
- Fix user-level ray cluster causing SkyPilot cluster to be in INIT state (2020)
- Logging
- Fix certain logs of multi-node jobs not being streamed due to Ray 2.4 log dedup (2026)
- Fix logs being created in current pwd `$PWD/~/sky_logs` in some cases (2009)
- Managed spot
- Fix `sky spot launch --retry-until-up` to make it actually retry until up (2004)
- Storage
- Fix a rare storage cloud check error if `sky check` has never been called (2017)
- On-prem
- Fix detecting A5000 and A6000 GPUs (2023)

**Full Changelog**: https://github.com/skypilot-org/skypilot/compare/v0.3.0...v0.3.1

Page 2 of 4

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.