:art: Stable Diffusion 2 is here!
Installation
`pip install diffusers[torch]==0.9 transformers`
Stable Diffusion 2.0 is available in several flavors:
Stable Diffusion 2.0-V at `768x768`
New stable diffusion model (Stable Diffusion 2.0-v) at 768x768 resolution. Same number of parameters in the U-Net as `1.5`, but uses [OpenCLIP-ViT/H](https://github.com/mlfoundations/open_clip) as the text encoder and is trained from scratch. SD 2.0-v is a so-called [v-prediction](https://arxiv.org/abs/2202.00512) model.
![image](https://user-images.githubusercontent.com/26864830/204018236-259ace29-c007-4002-ad19-98fd35464954.png)
python
import torch
from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler
repo_id = "stabilityai/stable-diffusion-2"
pipe = DiffusionPipeline.from_pretrained(repo_id, torch_dtype=torch.float16, revision="fp16")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe = pipe.to("cuda")
prompt = "High quality photo of an astronaut riding a horse in space"
image = pipe(prompt, guidance_scale=9, num_inference_steps=25).images[0]
image.save("astronaut.png")
Stable Diffusion 2.0-base at `512x512`
The above model is finetuned from SD 2.0-base, which was trained as a standard noise-prediction model on 512x512 images and is also made available.
![image](https://user-images.githubusercontent.com/26864830/204019534-3e4febce-55f8-4e27-9cc0-d8058ed00486.png)
python
import torch
from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler
repo_id = "stabilityai/stable-diffusion-2-base"
pipe = DiffusionPipeline.from_pretrained(repo_id, torch_dtype=torch.float16, revision="fp16")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe = pipe.to("cuda")
prompt = "High quality photo of an astronaut riding a horse in space"
image = pipe(prompt, num_inference_steps=25).images[0]
image.save("astronaut.png")
Stable Diffusion 2.0 for Inpanting
This model for text-guided inpanting is finetuned from SD 2.0-base. Follows the mask-generation strategy presented in [LAMA](https://github.com/saic-mdal/lama) which, in combination with the latent VAE representations of the masked image, are used as an additional conditioning.
![image](https://user-images.githubusercontent.com/26864830/204019798-e03b7905-73d5-4eda-abd4-c31f46bd0c49.png)
python
import PIL
import requests
import torch
from io import BytesIO
from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler
def download_image(url):
response = requests.get(url)
return PIL.Image.open(BytesIO(response.content)).convert("RGB")
img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
init_image = download_image(img_url).resize((512, 512))
mask_image = download_image(mask_url).resize((512, 512))
repo_id = "stabilityai/stable-diffusion-2-inpainting"
pipe = DiffusionPipeline.from_pretrained(repo_id, torch_dtype=torch.float16, revision="fp16")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe = pipe.to("cuda")
prompt = "Face of a yellow cat, high resolution, sitting on a park bench"
image = pipe(prompt=prompt, image=init_image, mask_image=mask_image, num_inference_steps=25).images[0]
image.save("yellow_cat.png")
Stable Diffusion X4 Upscaler
The model was trained on crops of size 512x512 and is a text-guided [latent upscaling diffusion model](https://arxiv.org/abs/2112.10752). In addition to the textual input, it receives a noise_level as an input parameter, which can be used to add noise to the low-resolution input according to a [predefined diffusion schedule](https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler/blob/main/low_res_scheduler/scheduler_config.json).
![image](https://user-images.githubusercontent.com/26864830/204020264-86807d85-3097-4755-ace6-cc1e6f24633d.png)
python
import requests
from PIL import Image
from io import BytesIO
from diffusers import StableDiffusionUpscalePipeline
import torch
model_id = "stabilityai/stable-diffusion-x4-upscaler"
pipeline = StableDiffusionUpscalePipeline.from_pretrained(model_id, revision="fp16", torch_dtype=torch.float16)
pipeline = pipeline.to("cuda")
url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd2-upscale/low_res_cat.png"
response = requests.get(url)
low_res_img = Image.open(BytesIO(response.content)).convert("RGB")
low_res_img = low_res_img.resize((128, 128))
prompt = "a white cat"
upscaled_image = pipeline(prompt=prompt, image=low_res_img).images[0]
upscaled_image.save("upsampled_cat.png")
Saving & Loading is fixed for Versatile Diffusion
Previously there was a :bug: when saving & loading versatile diffusion - this is fixed now so that memory efficient saving & loading works as expected.
* [Versatile Diffusion] Fix remaining tests by patrickvonplaten in 1418
:memo: Changelog
* add v prediction by patil-suraj in 1386
* Adapt UNet2D for supre-resolution by patil-suraj in 1385
* Version 0.9.0.dev0 by anton-l in 1394
* Make height and width optional by patrickvonplaten in 1401
* [Config] Add optional arguments by patrickvonplaten in 1395
* Upscaling fixed by patrickvonplaten in 1402
* Add the new SD2 attention params to the VD text unet by anton-l in 1400
* Deprecate sample size by patrickvonplaten in 1406
* Support SD2 attention slicing by anton-l in 1397
* Add SD2 inpainting integration tests by anton-l in 1412
* Fix sample size conversion script by patrickvonplaten in 1408
* fix clip guided by patrickvonplaten in 1414
* Fix all stable diffusion by patrickvonplaten in 1415
* [MPS] call contiguous after permute by kashif in 1411
* Deprecate `predict_epsilon` by pcuenca in 1393
* Fix ONNX conversion and inference by anton-l in 1416
* Allow to set config params directly in init by patrickvonplaten in 1419
* Add tests for Stable Diffusion 2 V-prediction 768x768 by anton-l in 1420
* StableDiffusionUpscalePipeline by patil-suraj in 1396
* added initial v-pred support to DPM-solver by kashif in 1421
* SD2 docs by patrickvonplaten in 1424