Stable Diffusion
Stable Diffusion is a text-to-image latent diffusion model created by the researchers and engineers from [CompVis](https://github.com/CompVis), [Stability AI](https://stability.ai/) and [LAION](https://laion.ai/). It's trained on 512x512 images from a subset of the [LAION-5B](https://laion.ai/blog/laion-5b/) database. This model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM.
See the [model card](https://huggingface.co/CompVis/stable-diffusion) for more information.
**The Stable Diffusion weights are currently only available to universities, academics, research institutions and independent researchers. Please request access applying to <a href="https://stability.ai/academia-access-form" target="_blank">this</a> form**
python
from torch import autocast
from diffusers import StableDiffusionPipeline
make sure you're logged in with `huggingface-cli login`
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-3-diffusers", use_auth_token=True)
prompt = "a photograph of an astronaut riding a horse"
with autocast("cuda"):
image = pipe(prompt, guidance_scale=7)["sample"][0] image here is in PIL format
image.save(f"astronaut_rides_horse.png")
K-LMS sampling
The new `LMSDiscreteScheduler` is a port of k-lms from [k-diffusion](https://github.com/crowsonkb/k-diffusion/blob/master/k_diffusion/sampling.py) by Katherine Crowson.
The scheduler can be easily swapped into existing pipelines like so:
python
from diffusers import StableDiffusionPipeline, LMSDiscreteScheduler
model_id = "CompVis/stable-diffusion-v1-3-diffusers"
Use the K-LMS scheduler here instead
scheduler = LMSDiscreteScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", num_train_timesteps=1000)
pipe = StableDiffusionPipeline.from_pretrained(model_id, scheduler=scheduler, use_auth_token=True)
Integration test with text-to-image script of [Stable-Diffusion](https://github.com/CompVis/stable-diffusion)
182 and 186 make sure that DDIM and PNDM/PLMS scheduler yield 1-to-1 the same results as stable diffusion.
Try it out yourself:
In [Stable-Diffusion](https://github.com/CompVis/stable-diffusion):
python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --n_samples 4 --n_iter 1 --fixed_code --plms
or
python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --n_samples 4 --n_iter 1 --fixed_code
In `diffusers`:
py
from diffusers import StableDiffusionPipeline, DDIMScheduler
from time import time
from PIL import Image
from einops import rearrange
import numpy as np
import torch
from torch import autocast
from torchvision.utils import make_grid
torch.manual_seed(42)
prompt = "a photograph of an astronaut riding a horse"
prompt = "a photograph of the eiffel tower on the moon"
prompt = "an oil painting of a futuristic forest gives"
uncomment to use DDIM
scheduler = DDIMScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", clip_sample=False, set_alpha_to_one=False)
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-3-diffusers", use_auth_token=True, scheduler=scheduler) make sure you're logged in with `huggingface-cli login`
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-3-diffusers", use_auth_token=True) make sure you're logged in with `huggingface-cli login`
all_images = []
num_rows = 1
num_columns = 4
for _ in range(num_rows):
with autocast("cuda"):
images = pipe(num_columns * [prompt], guidance_scale=7.5, output_type="np")["sample"] image here is in [PIL format](https://pillow.readthedocs.io/en/stable/)
all_images.append(torch.from_numpy(images))
additionally, save as grid
grid = torch.stack(all_images, 0)
grid = rearrange(grid, 'n b h w c -> (n b) h w c')
grid = rearrange(grid, 'n h w c -> n c h w')
grid = make_grid(grid, nrow=num_rows)
to image
grid = 255. * rearrange(grid, 'c h w -> h w c').cpu().numpy()
image = Image.fromarray(grid.astype(np.uint8))
image.save(f"./images/diffusers/{'_'.join(prompt.split())}_{round(time())}.png")
Improvements and bugfixes
* Allow passing non-default modules to pipeline by pcuenca in 188
* Add K-LMS scheduler from k-diffusion by anton-l in 185
* [Naming] correct config naming of DDIM pipeline by patrickvonplaten in 187
* [PNDM] Stable diffusion by patrickvonplaten in 186
* [Half precision] Make sure half-precision is correct by patrickvonplaten in 182
* allow custom height, width in StableDiffusionPipeline by patil-suraj in 179
* add tests for stable diffusion pipeline by patil-suraj in 178
* Stable diffusion pipeline by patil-suraj in 168
* [LDM pipeline] fix eta condition. by patil-suraj in 171
* [PNDM in LDM pipeline] use inspect in pipeline instead of unused kwargs by patil-suraj in 167
* allow pndm scheduler to be used with ldm pipeline by patil-suraj in 165
* add scaled_linear schedule in PNDM and DDPM by patil-suraj in 164
* add attention up/down blocks for VAE by patil-suraj in 161
* Add an alternative Karras et al. stochastic scheduler for VE models by anton-l in 160
* [LDMTextToImagePipeline] make text model generic by patil-suraj in 162
* Minor typos by pcuenca in 159
* Fix arg key for `dataset_name` in `create_model_card` by pcuenca in 158
* [VAE] fix the downsample block in Encoder. by patil-suraj in 156
* [UNet2DConditionModel] add cross_attention_dim as an argument by patil-suraj in 155
* Added `diffusers` to conda-forge and updated README for installation instruction by sugatoray in 129
* Add issue templates for feature requests and bug reports by osanseviero in 153
* Support training with a local image folder by anton-l in 152
* Allow DDPM scheduler to use model's predicated variance by eyalmazuz in 132
**Full Changelog**: https://github.com/huggingface/diffusers/compare/0.1.3...v0.2.0