Diffusers

Latest version: v0.32.2

Safety actively analyzes 714973 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 16

1.13

PyTorch and Apple have been working on improving `mps` support in PyTorch 1.13, so Apple Silicon is now a first-class citizen in diffusers 0.7.0!

Requirements

- Mac computer with Apple silicon (M1/M2) hardware.
- macOS 12.6 or later (13.0 or later recommended, as support is even better).
- arm64 version of Python.
- PyTorch 1.13.0 official release, installed from pip or the conda channels.

Memory efficient generation

Memory management is crucial to achieve fast generation speed. We recommend to always use _attention slicing_ on Apple Silicon, as it drastically reduces memory pressure and prevents paging or swapping. This is especially important for computers with less than 64 GB of Unified RAM, and may be the difference between generating an image in seconds rather than in minutes. Use it like this:

Python
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
pipe = pipe.to("mps")

Recommended if your computer has < 64 GB of RAM
pipe.enable_attention_slicing()

prompt = "a photo of an astronaut riding a horse on mars"

First-time "warmup" pass
_ = pipe(prompt, num_inference_steps=1)

image = pipe(prompt).images[0]
image.save("astronaut.png")


Continuous Integration

Our automated tests now include a full battery of tests on the `mps` device. This will be helpful to identify issues early and ensure the quality on Apple Silicon going forward.

See more details in [the documentation](https://huggingface.co/docs/diffusers/optimization/mps).

💃 Dance Diffusion
diffusers goes audio 🎵 Dance Diffusion by [Harmonai](https://www.harmonai.org) is the first audio model in 🧨Diffusers!

* [Dance Diffusion] Add dance diffusion by patrickvonplaten 803

Try it out to generate some random music:

python
from diffusers import DiffusionPipeline
import scipy

model_id = "harmonai/maestro-150k"
pipeline = DiffusionPipeline.from_pretrained(model_id)
pipeline = pipeline.to("cuda")

audio = pipeline(audio_length_in_s=4.0).audios[0]

To save locally
scipy.io.wavfile.write("maestro_test.wav", pipe.unet.sample_rate, audio.transpose())


🎉 Euler schedulers

These are the Euler schedulers, from the paper [Elucidating the Design Space of Diffusion-Based Generative Models](https://arxiv.org/abs/2206.00364) by Karras et al. (2022). The diffusers implementation is based on the original [k-diffusion](https://github.com/crowsonkb/k-diffusion/blob/481677d114f6ea445aa009cf5bd7a9cdee909e47/k_diffusion/sampling.py#L51) implementation by Katherine Crowson. The Euler schedulers are fast, often times generating really good outputs with 20-30 steps.

* k-diffusion-euler by hlky 1019


python
from diffusers import StableDiffusionPipeline, EulerDiscreteScheduler

euler_scheduler = EulerDiscreteScheduler.from_config("runwayml/stable-diffusion-v1-5", subfolder="scheduler")
pipeline = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5", scheduler=euler_scheduler, revision="fp16", torch_dtype=torch.float16
)
pipeline.to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"
image = pipeline(prompt, num_inference_steps=25).images[0]


python
from diffusers import StableDiffusionPipeline, EulerAncestralDiscreteScheduler

euler_ancestral_scheduler = EulerAncestralDiscreteScheduler.from_config("runwayml/stable-diffusion-v1-5", subfolder="scheduler")
pipeline = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5", scheduler=euler_scheduler, revision="fp16", torch_dtype=torch.float16
)
pipeline.to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"
image = pipeline(prompt, num_inference_steps=25).images[0]


🔥 Up to 2x faster inference with `memory_efficient_attention`

Even faster and memory efficient stable diffusion using the efficient flash attention implementation from [xformers](https://github.com/facebookresearch/xformers)

* Up to 2x speedup on GPUs using memory efficient attention by MatthieuTPHR 532

To leverage it just make sure you have:
- PyTorch > 1.12
- Cuda available
- Installed the [xformers](https://github.com/facebookresearch/xformers) library
python
from diffusers import StableDiffusionPipeline
import torch

pipe = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
revision="fp16",
torch_dtype=torch.float16,
).to("cuda")

pipe.enable_xformers_memory_efficient_attention()

with torch.inference_mode():
sample = pipe("a small cat")

optional: You can disable it via
pipe.disable_xformers_memory_efficient_attention()


🚀 Much faster loading

Thanks to [`accelerate`](https://github.com/huggingface/accelerate), pipeline loading is much, much faster. There are two parts to it:

- First, when a model is created PyTorch initializes its weights by default. This takes a good amount of time. Using `low_cpu_mem_usage` (enabled by default), no initialization will be performed.
- Optionally, you can also use `device_map="auto"` to automatically select the best device(s) where the pre-trained weights will be initially sent to.

In our tests, loading time was more than halved on CUDA devices, and went down from 12s to 4s on an Apple M1 computer.

As a side effect, CPU usage will be greatly reduced during loading, because no temporary copies of the weights are necessary.

This feature requires PyTorch 1.9 or better and accelerate 0.8.0 or higher.

🎨 RePaint

[RePaint](https://arxiv.org/abs/2201.09865) allows to reuse any pretrained DDPM model for free-form inpainting by adding restarts to the denoising schedule. Based on the paper [RePaint: Inpainting using Denoising Diffusion Probabilistic Models](https://arxiv.org/abs/2201.09865) by Andreas Lugmayr et al.

python
from diffusers import RePaintPipeline, RePaintScheduler

Load the RePaint scheduler and pipeline based on a pretrained DDPM model
scheduler = RePaintScheduler.from_config("google/ddpm-ema-celebahq-256")
pipe = RePaintPipeline.from_pretrained("google/ddpm-ema-celebahq-256", scheduler=scheduler)
pipe = pipe.to("cuda")

generator = torch.Generator(device="cuda").manual_seed(0)
output = pipe(
original_image=original_image,
mask_image=mask_image,
num_inference_steps=250,
eta=0.0,
jump_length=10,
jump_n_sample=10,
generator=generator,
)
inpainted_image = output.images[0]


![image](https://user-images.githubusercontent.com/11280511/150766125-adf5a3cb-17f2-432c-a8f6-ce0b97122819.gif)

:earth_africa: Community Pipelines

Long Prompt Weighting Stable Diffusion
The Pipeline lets you input prompt without 77 token length limit. And you can increase words weighting by using "()" or decrease words weighting by using "[]". The Pipeline also lets you use the main use cases of the stable diffusion pipeline in a single class.
For a code example, see [Long Prompt Weighting Stable Diffusion](https://github.com/huggingface/diffusers/tree/main/examples/community#long-prompt-weighting-stable-diffusion)
* [Community Pipelines] Long Prompt Weighting Stable Diffusion Pipelines by SkyTNT in 907

Speech to Image
Generate an image from an audio sample using pre-trained OpenAI whisper-small and Stable Diffusion.
For a code example, see [Speech to Image](https://github.com/huggingface/diffusers/tree/main/examples/community#speech-to-image)
* [Examples] add speech to image pipeline example by MikailINTech in https://github.com/huggingface/diffusers/pull/897

Wildcard Stable Diffusion
A minimal implementation that allows for users to add "wildcards", denoted by `__wildcard__` to prompts that are used as placeholders for randomly sampled values given by either a dictionary or a .txt file.
For a code example, see [Wildcard Stable Diffusion](https://github.com/huggingface/diffusers/tree/main/examples/community#wildcard-stable-diffusion)
* Wildcard stable diffusion pipeline by shyamsn97 in 900

Composable Stable Diffusion
Use logic operators to do compositional generation.
For a code example, see [Composable Stable Diffusion](https://github.com/huggingface/diffusers/tree/main/examples/community#wildcard-stable-diffusion)
* Add Composable diffusion to community pipeline examples by MarkRich in 951

Imagic Stable Diffusion
Image editing with Stable Diffusion.
For a code example, see [Imagic Stable Diffusion](https://github.com/huggingface/diffusers/tree/main/examples/community#imagic-stable-diffusion)
* Add imagic to community pipelines by MarkRich in 958

Seed Resizing
Allows to generate a larger image while keeping the content of the original image.
For a code example, see [Seed Resizing](https://github.com/huggingface/diffusers/tree/main/examples/community#seed-resizing)
* Add seed resizing to community pipelines by MarkRich in 1011


:memo: Changelog
* [Community Pipelines] Long Prompt Weighting Stable Diffusion Pipelines by SkyTNT in 907
* [Stable Diffusion] Add components function by patrickvonplaten in 889
* [PNDM Scheduler] Make sure list cannot grow forever by patrickvonplaten in 882
* [DiffusionPipeline.from_pretrained] add warning when passing unused k… by patrickvonplaten in 870
* DOC Dreambooth Add --sample_batch_size=1 to the 8 GB dreambooth example script by leszekhanusz in 829
* [Examples] add speech to image pipeline example by MikailINTech in 897
* [dreambooth] dont use safety check when generating prior images by patil-suraj in 922
* Dreambooth class image generation: using unique names to avoid overwriting existing image by leszekhanusz in 847
* fix test_components by patil-suraj in 928
* Fix Compatibility with Nvidia NGC Containers by tasercake in 919
* [Community Pipelines] Fix pad_tokens_and_weights in lpw_stable_diffusion by SkyTNT in 925
* Bump the version to 0.7.0.dev0 by anton-l in 912
* Introduce the copy mechanism by anton-l in 924
* [Tests] Move stable diffusion into their own files by patrickvonplaten in 936
* [Flax] dont warn for bf16 weights by patil-suraj in 923
* Support LMSDiscreteScheduler in LDMPipeline by mkshing in 891
* Wildcard stable diffusion pipeline by shyamsn97 in 900
* [MPS] fix mps failing tests by kashif in 934
* fix a small typo in pipeline_ddpm.py by chenguolin in 948
* Reorganize pipeline tests by anton-l in 963
* v1-5 docs updates by apolinario in 921
* add community pipeline docs; add minimal text to some empty doc pages by natolambert in 930
* Fix typo: `torch_type` -> `torch_dtype` by pcuenca in 972
* add num_inference_steps arg to DDPM by tmabraham in 935
* Add Composable diffusion to community pipeline examples by MarkRich in 951
* [Flax] added broadcast_to_shape_from_left helper and Scheduler tests by kashif in 864
* [Tests] Fix `mps` reproducibility issue when running with pytest-xdist by anton-l in 976
* mps changes for PyTorch 1.13 by pcuenca in 926
* [Onnx] support half-precision and fix bugs for onnx pipelines by SkyTNT in 932
* [Dance Diffusion] Add dance diffusion by patrickvonplaten in 803
* [Dance Diffusion] FP16 by patrickvonplaten in 980
* [Dance Diffusion] Better naming by patrickvonplaten in 981
* Fix typo in documentation title by echarlaix in 975
* Add --pretrained_model_name_revision option to train_dreambooth.py by shirayu in 933
* Do not use torch.float64 on the mps device by pcuenca in 942
* CompVis -> diffusers script - allow converting from merged checkpoint to either EMA or non-EMA by patrickvonplaten in 991
* fix a bug in the new version by xiaohu2015 in 957
* Fix typos by shirayu in 978
* Add missing import by juliensimon in 979
* minimal stable diffusion GPU memory usage with accelerate hooks by piEsposito in 850
* [inpaint pipeline] fix bug for multiple prompts inputs by xiaohu2015 in 959
* Enable multi-process DataLoader for dreambooth by skirsten in 950
* Small modification to enable usage by external scripts by briancw in 956
* [Flax] Add Textual Inversion by duongna21 in 880
* Continuation of 942: additional float64 failure by pcuenca in 996
* fix dreambooth script. by patil-suraj in 1017
* [Accelerate model loading] Fix meta device and super low memory usage by patrickvonplaten in 1016
* [Flax] Add finetune Stable Diffusion by duongna21 in 999
* [DreamBooth] Set train mode for text encoder by duongna21 in 1012
* [Flax] Add DreamBooth by duongna21 in 1001
* Deprecate `init_git_repo`, refactor `train_unconditional.py` by anton-l in 1022
* update readme for flax examples by patil-suraj in 1026
* Probably nicer to specify dependency on tensorboard in the training example by lukovnikov in 998
* Add `--dataloader_num_workers` to the DDPM training example by anton-l in 1027
* Document sequential CPU offload method on Stable Diffusion pipeline by piEsposito in 1024
* Support grayscale images in `numpy_to_pil` by anton-l in 1025
* [Flax SD finetune] Fix dtype by duongna21 in 1038
* fix `F.interpolate()` for large batch sizes by NouamaneTazi in 1006
* [Tests] Improve unet / vae tests by patrickvonplaten in 1018
* [Tests] Speed up slow tests by patrickvonplaten in 1040
* Fix some failing tests by patrickvonplaten in 1041
* [Tests] Better prints by patrickvonplaten in 1043
* [Tests] no random latents anymore by patrickvonplaten in 1045
* Update training and fine-tuning docs by pcuenca in 1020
* Fix speedup ratio in fp16.mdx by mwbyeon in 837
* clean incomplete pages by natolambert in 1008
* Add seed resizing to community pipelines by MarkRich in 1011
* Tests: upgrade PyTorch cuda to 11.7 to fix examples tests. by pcuenca in 1048
* Experimental: allow fp16 in `mps` by pcuenca in 961
* Move safety detection to model call in Flax safety checker by jonatanklosko in 1023
* Fix pipelines user_agent, ignore CI requests by anton-l in 1058
* [GitBot] Automatically close issues after inactivitiy by patrickvonplaten in 1079
* Allow `safety_checker` to be `None` when using CPU offload by pcuenca in 1078
* k-diffusion-euler by hlky in 1019
* [Better scheduler docs] Improve usage examples of schedulers by patrickvonplaten in 890
* [Tests] Fix slow tests by patrickvonplaten in 1087
* Remove nn sequential by patrickvonplaten in 1086
* Remove some unused parameter in CrossAttnUpBlock2D by LaurentMazare in 1034
* Add imagic to community pipelines by MarkRich in 958
* Up to 2x speedup on GPUs using memory efficient attention by MatthieuTPHR in 532
* [docs] add euler scheduler in docs, how to use differnet schedulers by patil-suraj in 1089
* Integration tests precision improvement for inpainting by Lewington-pitsos in 1052
* lpw_stable_diffusion: Add is_cancelled_callback by irgolic in 1053
* Rename latent by patrickvonplaten in 1102
* fix typo in examples dreambooth README.md by jorahn in 1073
* fix model card url in text inversion readme. by patil-suraj in 1103
* [CI] Framework and hardware-specific CI tests by anton-l in 997
* Fix a small typo of a variable name by omihub777 in 1063
* Fix tests for equivalence of DDIM and DDPM pipelines by sgrigory in 1069
* Fix padding in dreambooth by shirayu in 1030
* [Flax] time embedding by kashif in 1081
* Training to predict x0 in training example by lukovnikov in 1031
* [Loading] Ignore unneeded files by patrickvonplaten in 1107
* Fix hub-dependent tests for PRs by anton-l in 1119
* Allow saving `None` pipeline components by anton-l in 1118
* feat: add repaint by Revist in 974
* Continuation of 1035 by pcuenca in 1120
* VQ-diffusion by williamberman in 658

1.1

[Lvmin Zhang](https://github.com/lllyasviel) has released improved ControlNet checkpoints as well as a couple of new ones.

You can find all :firecracker: Diffusers checkpoints [here](https://huggingface.co/models?other=controlnet-v1-1)
Please have a look directly at the model cards on how to use the checkpoins:

Improved checkpoints:

| Model Name | Control Image Overview| Control Image Example | Generated Image Example |
|---|---|---|---|
|[lllyasviel/control_v11p_sd15_canny](https://huggingface.co/lllyasviel/control_v11p_sd15_canny)<br/> *Trained with canny edge detection* | A monochrome image with white edges on a black background.|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_canny/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15_canny/resolve/main/images/control.png"/></a>|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_canny/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15_canny/resolve/main/images/image_out.png"/></a>|
|[lllyasviel/control_v11p_sd15_mlsd](https://huggingface.co/lllyasviel/control_v11p_sd15_mlsd)<br/> Trained with multi-level line segment detection | An image with annotated line segments.|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_mlsd/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15_mlsd/resolve/main/images/control.png"/></a>|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_mlsd/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15_mlsd/resolve/main/images/image_out.png"/></a>|
|[lllyasviel/control_v11f1p_sd15_depth](https://huggingface.co/lllyasviel/control_v11f1p_sd15_depth)<br/> Trained with depth estimation | An image with depth information, usually represented as a grayscale image.|<a href="https://huggingface.co/lllyasviel/control_v11f1p_sd15_depth/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11f1p_sd15_depth/resolve/main/images/control.png"/></a>|<a href="https://huggingface.co/lllyasviel/control_v11f1p_sd15_depth/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11f1p_sd15_depth/resolve/main/images/image_out.png"/></a>|
|[lllyasviel/control_v11p_sd15_normalbae](https://huggingface.co/lllyasviel/control_v11p_sd15_normalbae)<br/> Trained with surface normal estimation | An image with surface normal information, usually represented as a color-coded image.|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_normalbae/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15_normalbae/resolve/main/images/control.png"/></a>|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_normalbae/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15_normalbae/resolve/main/images/image_out.png"/></a>|
|[lllyasviel/control_v11p_sd15_seg](https://huggingface.co/lllyasviel/control_v11p_sd15_seg)<br/> Trained with image segmentation | An image with segmented regions, usually represented as a color-coded image.|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_seg/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15_seg/resolve/main/images/control.png"/></a>|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_seg/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15_seg/resolve/main/images/image_out.png"/></a>|
|[lllyasviel/control_v11p_sd15_lineart](https://huggingface.co/lllyasviel/control_v11p_sd15_lineart)<br/> Trained with line art generation | An image with line art, usually black lines on a white background.|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_lineart/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15_lineart/resolve/main/images/control.png"/></a>|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_lineart/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15_lineart/resolve/main/images/image_out.png"/></a>|
|[lllyasviel/control_v11p_sd15_openpose](https://huggingface.co/lllyasviel/control_v11p_sd15s2_lineart_anime)<br/> Trained with human pose estimation | An image with human poses, usually represented as a set of keypoints or skeletons.|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_openpose/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15_openpose/resolve/main/images/control.png"/></a>|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_openpose/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15_openpose/resolve/main/images/image_out.png"/></a>|
|[lllyasviel/control_v11p_sd15_scribble](https://huggingface.co/lllyasviel/control_v11p_sd15_scribble)<br/> Trained with scribble-based image generation | An image with scribbles, usually random or user-drawn strokes.|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_scribble/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15_scribble/resolve/main/images/control.png"/></a>|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_scribble/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15_scribble/resolve/main/images/image_out.png"/></a>|
|[lllyasviel/control_v11p_sd15_softedge](https://huggingface.co/lllyasviel/control_v11p_sd15_softedge)<br/> Trained with soft edge image generation | An image with soft edges, usually to create a more painterly or artistic effect.|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_softedge/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15_softedge/resolve/main/images/control.png"/></a>|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_softedge/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15_softedge/resolve/main/images/image_out.png"/></a>|

New checkpoints:

| Model Name | Control Image Overview| Control Image Example | Generated Image Example |
|---|---|---|---|
|[lllyasviel/control_v11e_sd15_ip2p](https://huggingface.co/lllyasviel/control_v11e_sd15_ip2p)<br/> *Trained with pixel to pixel instruction* | No condition .|<a href="https://huggingface.co/lllyasviel/control_v11e_sd15_ip2p/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11e_sd15_ip2p/resolve/main/images/control.png"/></a>|<a href="https://huggingface.co/lllyasviel/control_v11e_sd15_ip2p/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11e_sd15_ip2p/resolve/main/images/image_out.png"/></a>|
|[lllyasviel/control_v11p_sd15_inpaint](https://huggingface.co/lllyasviel/control_v11p_sd15_inpaint)<br/> Trained with image inpainting | No condition.|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_inpaint/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15_inpaint/resolve/main/images/control.png"/></a>|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_inpaint/resolve/main/images/output.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15_inpaint/resolve/main/images/output.png"/></a>|
|[lllyasviel/control_v11e_sd15_shuffle](https://huggingface.co/lllyasviel/control_v11e_sd15_shuffle)<br/> Trained with image shuffling | An image with shuffled patches or regions.|<a href="https://huggingface.co/lllyasviel/control_v11e_sd15_shuffle/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11e_sd15_shuffle/resolve/main/images/control.png"/></a>|<a href="https://huggingface.co/lllyasviel/control_v11e_sd15_shuffle/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11e_sd15_shuffle/resolve/main/images/image_out.png"/></a>|
|[lllyasviel/control_v11p_sd15s2_lineart_anime](https://huggingface.co/lllyasviel/control_v11p_sd15s2_lineart_anime)<br/> Trained with anime line art generation | An image with anime-style line art.|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15s2_lineart_anime/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15s2_lineart_anime/resolve/main/images/control.png"/></a>|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15s2_lineart_anime/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15s2_lineart_anime/resolve/main/images/image_out.png"/></a>|
 
All commits

* [Tests] Speed up panorama tests by sayakpaul in 3067
* [Post release] v0.16.0dev by patrickvonplaten in 3072
* Adds profiling flags, computes train metrics average. by andsteing in 3053
* [Pipelines] Make sure that None functions are correctly not saved by patrickvonplaten in 3080
* doc string example remove from_pt by yiyixuxu in 3083
* [Tests] parallelize by patrickvonplaten in 3078
* Throw deprecation warning for return_cached_folder by patrickvonplaten in 3092
* Allow SD attend and excite pipeline to work with any size output images by jcoffland in 2835
* [docs] Update community pipeline docs by stevhliu in 2989
* Add to support Guess Mode for StableDiffusionControlnetPipleline by takuma104 in 2998
* fix default value for attend-and-excite by yiyixuxu in 3099
* remvoe one line as requested by gc team by yiyixuxu in 3077
* ddpm custom timesteps by williamberman in 3007
* Fix breaking change in `pipeline_stable_diffusion_controlnet.py` by remorses in 3118
* Add global pooling to controlnet by patrickvonplaten in 3121
* [Bug fix] Fix img2img processor with safety checker by patrickvonplaten in 3127
* [Bug fix] Make sure correct timesteps are chosen for img2img by patrickvonplaten in 3128
* Improve deprecation warnings by patrickvonplaten in 3131
* Fix config deprecation by patrickvonplaten in 3129
* feat: verfication of multi-gpu support for select examples. by sayakpaul in 3126
* speed up attend-and-excite fast tests by yiyixuxu in 3079
* Optimize log_validation in train_controlnet_flax by cgarciae in 3110
* make style by patrickvonplaten (direct commit on main)
* Correct textual inversion readme by patrickvonplaten in 3145
* Add unet act fn to other model components by williamberman in 3136
* class labels timestep embeddings projection dtype cast by williamberman in 3137
* [ckpt loader] Allow loading the Inpaint and Img2Img pipelines, while loading a ckpt model by cmdr2 in 2705
* add from_ckpt method as Mixin by 1lint in 2318
* Add TensorRT SD/txt2img Community Pipeline to diffusers along with TensorRT utils by asfiyab-nvidia in 2974
* Correct `Transformer2DModel.forward` docstring by off99555 in 3074
* Update pipeline_stable_diffusion_inpaint_legacy.py by hwuebben in 2903
* Modified altdiffusion pipline to support altdiffusion-m18 by superhero-7 in 2993
* controlnet training resize inputs to multiple of 8 by williamberman in 3135
* adding custom diffusion training to diffusers examples by nupurkmr9 in 3031
* Update custom_diffusion.mdx by mishig25 in 3165
* Added distillation for quantization example on textual inversion. by XinyuYe-Intel in 2760
* make style by patrickvonplaten (direct commit on main)
* Merge branch 'main' of https://github.com/huggingface/diffusers by patrickvonplaten (direct commit on main)
* Update Noise Autocorrelation Loss Function for Pix2PixZero Pipeline by clarencechen in 2942
* [DreamBooth] add text encoder LoRA support in the DreamBooth training script by sayakpaul in 3130
* Update Habana Gaudi documentation by regisss in 3169
* Add model offload to x4 upscaler by patrickvonplaten in 3187
* [docs] Deterministic algorithms by stevhliu in 3172
* Update custom_diffusion.mdx to credit the author by sayakpaul in 3163
* Fix TensorRT community pipeline device set function by asfiyab-nvidia in 3157
* make `from_flax` work for controlnet by yiyixuxu in 3161
* [docs] Clarify training args by stevhliu in 3146
* Multi Vector Textual Inversion by patrickvonplaten in 3144
* Add `Karras sigmas` to HeunDiscreteScheduler by youssefadr in 3160
* [AudioLDM] Fix dtype of returned waveform by sanchit-gandhi in 3189
* Fix bug in train_dreambooth_lora by crywang in 3183
* [Community Pipelines] Update lpw_stable_diffusion pipeline by SkyTNT in 3197
* Make sure VAE attention works with Torch 2_0 by patrickvonplaten in 3200
* Revert "[Community Pipelines] Update lpw_stable_diffusion pipeline" by williamberman in 3201
* [Bug fix] Fix batch size attention head size mismatch by patrickvonplaten in 3214
* fix mixed precision training on train_dreambooth_inpaint_lora by themrzmaster in 3138
* adding enable_vae_tiling and disable_vae_tiling functions by init-22 in 3225
* Add ControlNet v1.1 docs by patrickvonplaten in 3226
* Fix issue in maybe_convert_prompt by pdoane in 3188
* Sync cache version check from transformers by ychfan in 3179
* Fix docs text inversion by patrickvonplaten in 3166
* add model by patrickvonplaten in 3230
* Allow return pt x4 by patrickvonplaten in 3236
* Allow fp16 attn for x4 upscaler by patrickvonplaten in 3239
* fix fast test by patrickvonplaten in 3241
* Adds a document on token merging by sayakpaul in 3208
* [AudioLDM] Update docs to use updated ckpt by sanchit-gandhi in 3240

1.0

set_seed(seed)
images = pipe(
prompt, num_images_per_prompt=num_images_per_prompt, guidance_scale=guidance_scale, sag_scale=sag_scale
).images
images[0].save("example.png")


SAG was contributed by SusungHong (lead author of SAG) in https://github.com/huggingface/diffusers/pull/2193.

<a name="panorama"></a>
MultiDiffusion panorama

Proposed in [MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation](https://arxiv.org/abs/2302.08113), it presents a new generation process, "MultiDiffusion", based on an optimization task that binds together multiple diffusion generation processes with a shared set of parameters or constraints.

python
import torch
from diffusers import StableDiffusionPanoramaPipeline, DDIMScheduler

model_ckpt = "stabilityai/stable-diffusion-2-base"
scheduler = DDIMScheduler.from_pretrained(model_ckpt, subfolder="scheduler")
pipe = StableDiffusionPanoramaPipeline.from_pretrained(model_ckpt, scheduler=scheduler, torch_dtype=torch.float16)

pipe = pipe.to("cuda")

prompt = "a photo of the dolomites"
image = pipe(prompt).images[0]
image.save("dolomites.png")


The pipeline was contributed by omerbt (lead author of MultiDiffusion Panorama) and sayakpaul in 2393.

Ethical Guidelines

Diffusers is no stranger to the different opinions and perspectives about the challenges that generative technologies bring. Thanks to giadilli, we have drafted our first [Diffusers' Ethical Guidelines](https://huggingface.co/docs/diffusers/main/en/conceptual/ethical_guidelines) with which we hope to initiate a fruitful conversation with the community.

Keras Integration

Many practitioners find it easy to fine-tune the Stable Diffusion models shipped by KerasCV. At the same time, `diffusers` provides a lot of options for inference, deployment and optimization. We have made it possible to easily import and use KerasCV Stable Diffusion checkpoints in `diffusers`, read more about the process [in our new guide](https://huggingface.co/docs/diffusers/main/en/using-diffusers/kerascv).

:clock3: UniPC scheduler

UniPC is a new fast scheduler in diffusion town! [UniPC](https://arxiv.org/abs/2302.04867) is a training-free framework designed for the fast sampling of diffusion models, which consists of a corrector (UniC) and a predictor (UniP) that share a unified analytical form and support arbitrary orders.
The orginal codebase can be found [here](https://github.com/wl-zhao/UniPC). Thanks to wl-zhao for the great work and integrating UniPC into the diffusers!

* add the UniPC scheduler by wl-zhao in 2373


:runner: Training: consistent EMA support

As part of 0.13.0 we improved the support for EMA in training. We added a common `EMAModel` in `diffusers.training_utils` which can be used by all scripts. The `EMAModel` is improved to support distributed training,
new methods to easily evaluate the EMA model during training and a consistent way to save and load the EMA model similar to other models in `diffusers`.

* Fix EMA for multi-gpu training in the unconditional example by anton-l, patil-suraj 1930
* [Utils] Adds store() and restore() methods to EMAModel by sayakpaul 2302
* Use accelerate save & loading hooks to have better checkpoint structure by patrickvonplaten 2048


:dog: Ruff & black

We have replaced `flake8` with `ruff` (much faster), and updated our version of `black`. These tools are now in sync with the ones used in `transformers`, so the contributing experience is now more consistent for people using both codebases :)

All commits

* [lora] Fix bug with training without validation by orenwang in 2106
* [Bump version] 0.13.0dev0 & Deprecate `predict_epsilon` by patrickvonplaten in 2109
* [dreambooth] check the low-precision guard before preparing model by patil-suraj in 2102
* [textual inversion] Allow validation images by pcuenca in 2077
* Allow `UNet2DModel` to use arbitrary class embeddings by pcuenca in 2080
* make scaling factor a config arg of vae/vqvae by patil-suraj in 1860
* [Import Utils] Fix naming by patrickvonplaten in 2118
* Fix unable to save_pretrained when using pathlib by Cyberes in 1972
* fuse attention mask by williamberman in 2111
* Fix model card of LoRA by hysts in 2114
* [nit] torch_dtype used twice in doc string by williamberman in 2126
* [LoRA] Make sure LoRA can be disabled after it's run by patrickvonplaten in 2128
* remove redundant allow_patterns by williamberman in 2130
* Allow lora from pipeline by patrickvonplaten in 2129
* Fix typos in loaders.py by kuotient in 2137
* Typo fix: `torwards` -> `towards` by RahulBhalley in 2134
* Don't call the Hub if `local_files_only` is specifiied by patrickvonplaten in 2119
* [from_pretrained] only load config one time by williamberman in 2131
* Adding some `safetensors` docs. by Narsil in 2122
* Fix typo by pcuenca in 2138
* fix typo in EMAModel's load_state_dict() by dasayan05 in 2151
* `[diffusers-cli]` Fix typo in accelerate and transformers versions by pcuenca in 2154
* [Design philosopy] Create official doc by patrickvonplaten in 2140
* Section on using LoRA alpha / scale by pcuenca in 2139
* Don't copy when unwrapping model by pcuenca in 2166
* Add instance prompt to model card of lora dreambooth example by hysts in 2112
* [Bug]: fix DDPM scheduler arbitrary infer steps count. by dudulightricks in 2076
* [examples] Fix CLI argument in the launch script command for text2image with LoRA by sayakpaul in 2171
* [Breaking change] fix legacy inpaint noise and resize mask tensor by 1lint in 2147
* Use `requests` instead of `wget` in `convert_from_ckpt.py` by Abhishek-Varma in 2168
* [Docs] Add components to docs by patrickvonplaten in 2175
* [Docs] remove license by patrickvonplaten in 2188
* Pass LoRA rank to LoRALinearLayer by asadm in 2191
* add: guide on kerascv conversion tool. by sayakpaul in 2169
* Fix a dimension bug in Transform2d by lmxyy in 2144
* [Loading] Better error message on missing keys by patrickvonplaten in 2198
* Update xFormers docs by pcuenca in 2208
* add CITATION.cff by kashif in 2211
* Create train_dreambooth_inpaint_lora.py by thedarkzeno in 2205
* Docs: short section on changing the scheduler in Flax by pcuenca in 2181
* [Bug] scheduling_ddpm: fix variance in the case of learned_range type. by dudulightricks in 2090
* refactor onnxruntime integration by prathikr in 2042
* Fix timestep dtype in legacy inpaint by dymil in 2120
* [nit] negative_prompt typo by williamberman in 2227
* removes `~`s in favor of full-fledged links. by sayakpaul in 2229
* [LoRA] Make sure validation works in multi GPU setup by patrickvonplaten in 2172
* fix: flagged_images implementation by justinmerrell in 1947
* Hotfix textual inv logging by isamu-isozaki in 2183
* Fixes LoRAXFormersCrossAttnProcessor by jorgemcgomes in 2207
* Fix typo in StableDiffusionInpaintPipeline by hutec in 2197
* [Flax DDPM] Make `key` optional so default pipelines don't fail by pcuenca in 2176
* Show error when loading safety_checker `from_flax` by pcuenca in 2187
* Fix k_dpm_2 & k_dpm_2_a on MPS by psychedelicious in 2241
* Fix a typo: bfloa16 -> bfloat16 by nickkolok in 2243
* Mention training problems with xFormers 0.0.16 by pcuenca in 2254
* fix distributed init twice by Fazziekey in 2252
* Fixes prompt input checks in StableDiffusion img2img pipeline by jorgemcgomes in 2206
* Create convert_vae_pt_to_diffusers.py by chavinlo in 2215
* Stable Diffusion Latent Upscaler by yiyixuxu in 2059
* [Examples] Remove datasets important that is not needed by patrickvonplaten in 2267
* Make center crop and random flip as args for unconditional image generation by wfng92 in 2259
* [Tests] Fix slow tests by patrickvonplaten in 2271
* Fix torchvision.transforms and transforms function naming clash by wfng92 in 2274
* mps cross-attention hack: don't crash on fp16 by pcuenca in 2258
* Use `accelerate` save & loading hooks to have better checkpoint structure by patrickvonplaten in 2048
* Replace flake8 with ruff and update black by patrickvonplaten in 2279
* Textual inv save log memory by isamu-isozaki in 2184
* EMA: fix `state_dict()` and `load_state_dict()` & add `cur_decay_value` by chenguolin in 2146
* [Examples] Test all examples on CPU by patrickvonplaten in 2289
* fix pix2pix docs by patrickvonplaten in 2290
* misc fixes by williamberman in 2282
* Run same number of DDPM steps in inference as training by bencevans in 2263
* [LoRA] Freezing the model weights by erkams in 2245
* Fast CPU tests should also run on main by patrickvonplaten in 2313
* Correct fast tests by patrickvonplaten in 2314
* remove ddpm test_full_inference by williamberman in 2291
* convert ckpt script docstring fixes by williamberman in 2293
* [Community Pipeline] UnCLIP Text Interpolation Pipeline by Abhinay1997 in 2257
* [Tests] Refactor push tests by patrickvonplaten in 2329
* Add ethical guidelines by giadilli in 2330
* Fix running LoRA with xformers by bddppq in 2286
* Fix typo in load_pipeline_from_original_stable_diffusion_ckpt() method by p1atdev in 2320
* [Docs] Fix ethical guidelines docs by patrickvonplaten in 2333
* [Versatile Diffusion] Fix tests by patrickvonplaten in 2336
* [Latent Upscaling] Remove unused noise by patrickvonplaten in 2298
* [Tests] Remove unnecessary tests by patrickvonplaten in 2337
* karlo image variation use kakaobrain upload by williamberman in 2338
* github issue forum link by williamberman in 2335
* dreambooth checkpointing tests and docs by williamberman in 2339
* unet check length inputs by williamberman in 2327
* unCLIP variant by williamberman in 2297
* Log Unconditional Image Generation Samples to W&B by bencevans in 2287
* Fix callback type hints - no optional function argument by patrickvonplaten in 2357
* [Docs] initial docs about KarrasDiffusionSchedulers by kashif in 2349
* KarrasDiffusionSchedulers type note by williamberman in 2365
* [Tests] Add MPS skip decorator by patrickvonplaten in 2362
* Funky spacing issue by meg-huggingface in 2368
* schedulers add glide noising schedule by williamberman in 2347
* add total number checkpoints to training scripts by williamberman in 2367
* checkpointing_steps_total_limit->checkpoints_total_limit by williamberman in 2374
* Fix 3-way merging with the checkpoint_merger community pipeline by damian0815 in 2355
* [Variant] Add "variant" as input kwarg so to have better UX when downloading no_ema or fp16 weights by patrickvonplaten in 2305
* [Pipelines] Adds pix2pix zero by sayakpaul in 2334
* Add Self-Attention-Guided (SAG) Stable Diffusion pipeline by SusungHong in 2193
* [SchedulingPNDM ] reset cur_model_output after each call by patil-suraj in 2376
* train_text_to_image EMAModel saving by williamberman in 2341
* [Utils] Adds `store()` and `restore()` methods to EMAModel by sayakpaul in 2302
* `enable_model_cpu_offload` by pcuenca in 2285
* add the UniPC scheduler by wl-zhao in 2373
* Replace torch.concat calls by torch.cat by fxmarty in 2378
* Make diffusers importable with transformers < 4.26 by pcuenca in 2380
* [Examples] Make sure EMA works with any device by patrickvonplaten in 2382
* [Dummy imports] Add missing if else statements for SD] by patrickvonplaten in 2381
* Attend and excite 2 by yiyixuxu in 2369
* [Pix2Pix0] Add utility function to get edit vector by patrickvonplaten in 2383
* Revert "[Pix2Pix0] Add utility function to get edit vector" by patrickvonplaten in 2384
* Fix stable diffusion onnx pipeline error when batch_size > 1 by tianleiwu in 2366
* [Docs] Fix UniPC docs by wl-zhao in 2386
* [Pix2Pix Zero] Fix slow tests by sayakpaul in 2391
* [Pix2Pix] Add utility function by patrickvonplaten in 2385
* Fix UniPC tests and remove some test warnings by pcuenca in 2396
* [Pipelines] Add a section on generating captions and embeddings for Pix2Pix Zero by sayakpaul in 2395
* Torch2.0 scaled_dot_product_attention processor by patil-suraj in 2303
* add: inversion to pix2pix zero docs. by sayakpaul in 2398
* Add semantic guidance pipeline by manuelbrack in 2223
* Add ddim inversion pix2pix by patrickvonplaten in 2397
* add MultiDiffusionPanorama pipeline by omerbt in 2393
* Fixing typos in documentation by anagri in 2389
* controlling generation docs by williamberman in 2388
* apply_forward_hook simply returns if no accelerate by daquexian in 2387
* Revert "Release: v0.13.0" by williamberman in 2405
* controlling generation doc nits by williamberman in 2406
* Fix typo in AttnProcessor2_0 symbol by pcuenca in 2404
* add index page by yiyixuxu in 2401
* add xformers 0.0.16 warning message by williamberman in 2345

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* thedarkzeno
* Create train_dreambooth_inpaint_lora.py (2205)
* prathikr
* refactor onnxruntime integration (2042)
* Abhinay1997
* [Community Pipeline] UnCLIP Text Interpolation Pipeline (2257)
* SusungHong
* Add Self-Attention-Guided (SAG) Stable Diffusion pipeline (2193)
* wl-zhao
* add the UniPC scheduler (2373)
* [Docs] Fix UniPC docs (2386)
* manuelbrack
* Add semantic guidance pipeline (2223)
* omerbt
* add MultiDiffusionPanorama pipeline (2393)

0.7266

phrases = ["a waterfall", "a modern high speed train running through the tunnel"]

images = pipe(
prompt=prompt,
gligen_phrases=phrases,
gligen_boxes=boxes,
gligen_scheduled_sampling_beta=1,
output_type="pil",
num_inference_steps=50,
).images
images[0].save("./gligen-1-4-generation-text-box.jpg")


Refer to the [documentation](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/gligen) to learn more.

Thanks to nikhil-masterful for contributing GLIGEN in 4441.

Tiny Autoencoder

madebyollin trained two Autoencoders (on [Stable Diffusion](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/overview) and [Stable Diffusion XL](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/stable_diffusion_xl), respectively) to dramatically cut down the image decoding time. The effects are especially pronounced when working with larger-resolution images. You can use `AutoencoderTiny` to take advantage of it.

Here’s the example usage for Stable Diffusion:

python
import torch
from diffusers import DiffusionPipeline, AutoencoderTiny

pipe = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-2-1-base", torch_dtype=torch.float16
)
pipe.vae = AutoencoderTiny.from_pretrained("madebyollin/taesd", torch_dtype=torch.float16)
pipe = pipe.to("cuda")

prompt = "slice of delicious New York-style berry cheesecake"
image = pipe(prompt, num_inference_steps=25).images[0]
image.save("cheesecake.png")


Refer to the [documentation](https://huggingface.co/docs/diffusers/main/en/api/models/autoencoder_tiny) to learn more. Refer to [this material](https://gist.github.com/sayakpaul/a57a86ee7419ac3e7a7879fd100e8d06) to understand the implications of using this Autoencoder in terms of inference latency and memory footprint.

Fine-tuning Stable Diffusion XL with DreamBooth and LoRA on a free-tier Colab Notebook

Stable Diffusion XL’s (SDXL) high memory requirements often seem restrictive when it comes to using it for downstream applications. Even if one uses parameter-efficient fine-tuning techniques like [LoRA](https://huggingface.co/docs/diffusers/main/en/training/lora), fine-tuning just the UNet component of SDXL can be quite memory-intensive. So, running it on a free-tier Colab Notebook (that usually has a 16 GB T4 GPU attached) seems impossible.

Now, with better support for gradient checkpointing and other recipes like 8 Bit Adam (via `bitsandbytes`), it is possible to fine-tune the UNet of SDXL with DreamBooth and LoRA on a free-tier Colab Notebook.

Check out the [Colab Notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/SDXL_DreamBooth_LoRA_.ipynb) to learn more.

Thanks to ethansmith2000 for improving the gradient checkpointing support in 4474.

Support of `push_to_hub` for models, schedulers, and pipelines

Our models, schedulers, and pipelines now support an option of `push_to_hub` via the `save_pretrained()` and also come with a `push_to_hub()` method. Below are some examples of usage.

**Models**

python
from diffusers import ControlNetModel

controlnet = ControlNetModel(
block_out_channels=(32, 64),
layers_per_block=2,
in_channels=4,
down_block_types=("DownBlock2D", "CrossAttnDownBlock2D"),
cross_attention_dim=32,
conditioning_embedding_out_channels=(16, 32),
)
controlnet.push_to_hub("my-controlnet-model")
or controlnet.save_pretrained("my-controlnet-model", push_to_hub=True)


**Schedulers**

python
from diffusers import DDIMScheduler

scheduler = DDIMScheduler(
beta_start=0.00085,
beta_end=0.012,
beta_schedule="scaled_linear",
clip_sample=False,
set_alpha_to_one=False,
)
scheduler.push_to_hub("my-controlnet-scheduler")


**Pipelines**

python
from diffusers import (
UNet2DConditionModel,
AutoencoderKL,
DDIMScheduler,
StableDiffusionPipeline,
)
from transformers import CLIPTextModel, CLIPTextConfig, CLIPTokenizer

unet = UNet2DConditionModel(
block_out_channels=(32, 64),
layers_per_block=2,
sample_size=32,
in_channels=4,
out_channels=4,
down_block_types=("DownBlock2D", "CrossAttnDownBlock2D"),
up_block_types=("CrossAttnUpBlock2D", "UpBlock2D"),
cross_attention_dim=32,
)

scheduler = DDIMScheduler(
beta_start=0.00085,
beta_end=0.012,
beta_schedule="scaled_linear",
clip_sample=False,
set_alpha_to_one=False,
)

vae = AutoencoderKL(
block_out_channels=[32, 64],
in_channels=3,
out_channels=3,
down_block_types=["DownEncoderBlock2D", "DownEncoderBlock2D"],
up_block_types=["UpDecoderBlock2D", "UpDecoderBlock2D"],
latent_channels=4,
)

text_encoder_config = CLIPTextConfig(
bos_token_id=0,
eos_token_id=2,
hidden_size=32,
intermediate_size=37,
layer_norm_eps=1e-05,
num_attention_heads=4,
num_hidden_layers=5,
pad_token_id=1,
vocab_size=1000,
)
text_encoder = CLIPTextModel(text_encoder_config)
tokenizer = CLIPTokenizer.from_pretrained("hf-internal-testing/tiny-random-clip")

components = {
"unet": unet,
"scheduler": scheduler,
"vae": vae,
"text_encoder": text_encoder,
"tokenizer": tokenizer,
"safety_checker": None,
"feature_extractor": None,
}
pipeline = StableDiffusionPipeline(**components)
pipeline.push_to_hub("my-pipeline")


Refer to the [documentation](https://huggingface.co/docs/diffusers/main/en/using-diffusers/push_to_hub) to know more.

Thanks to Wauplin for his generous and constructive feedback (refer to this 4218) on this feature.

Better support for loading Kohya-trained LoRA checkpoints

Providing seamless support for loading Kohya-trained LoRA checkpoints from `diffusers` is important for us. This is why we continue to improve our `load_lora_weights()` method. Check out the [documentation](https://huggingface.co/docs/diffusers/main/en/training/lora#supporting-a1111-themed-lora-checkpoints-from-diffusers) to know more about what’s currently supported and the current limitations.

Thanks to isidentical for extending their help in improving this support.

Better documentation for prompt weighting

Prompt weighting provides a way to emphasize or de-emphasize certain parts of a prompt, allowing for more control over the generated image. `compel` provides an easy way to do prompt weighting compatible with `diffusers`. To this end, we have worked on an improved guide. Check it out [here](https://huggingface.co/docs/diffusers/main/en/using-diffusers/weighted_prompts).

Defaulting to serialize with `.safetensors`

Starting with this release, we will default to using `.safetensors` as our preferred serialization method. This change is reflected in all the training examples that we officially support.

All commits

* 0.20.0dev0 by patrickvonplaten in 4299
* update Kandinsky doc by yiyixuxu in 4301
* [Torch.compile] Fixes torch compile graph break by patrickvonplaten in 4315
* Fix SDXL conversion from original to diffusers by duongna21 in 4280
* fix a bug in StableDiffusionUpscalePipeline when `prompt` is `None` by yiyixuxu in 4278
* [Local loading] Correct bug with local files only by patrickvonplaten in 4318
* Fix typo documentation by echarlaix in 4320
* fix validation option for dreambooth training example by xinyangli in 4317
* [Tests] add test for pipeline import. by sayakpaul in 4276
* Honor the SDXL 1.0 licensing from the training scripts. by sayakpaul in 4319
* Update README_sdxl.md to correct the header by sayakpaul in 4330
* [SDXL Refiner] Fix refiner forward pass for batched input by patrickvonplaten in 4327
* correct doc string for default value of guidance_scale by Tanupriya-Singh in 4339
* [ONNX] Don't download ONNX model by default by patrickvonplaten in 4338
* Fix repeat of negative prompt by kathath in 4335
* [SDXL] Make watermarker optional under certain circumstances to improve usability of SDXL 1.0 by patrickvonplaten in 4346
* [Feat] Support SDXL Kohya-style LoRA by sayakpaul in 4287
* fix fp type in t2i adapter docs by williamberman in 4350
* Update README.md to have PyPI-friendly path by sayakpaul in 4351
* [SDXL-IP2P] Add gif for demonstrating training processes by harutatsuakiyama in 4342
* [SDXL] Fix dummy imports incorrect naming by patrickvonplaten in 4370
* Clean up duplicate lines in encode_prompt by avoroshilov in 4369
* minor doc fixes. by sayakpaul in 4380
* Update docs of unet_1d.py by nishant42491 in 4394
* [AutoPipeline] Correct naming by patrickvonplaten in 4420
* [ldm3d] documentation fixing typos by estelleafl in 4284
* Cleanup pass for flaky Slow Tests for Stable diffusion by DN6 in 4415
* support from_single_file for SDXL inpainting by yiyixuxu in 4408
* fix test_float16_inference by yiyixuxu in 4412
* train dreambooth fix pre encode class prompt by williamberman in 4395
* [docs] Fix SDXL docstring by stevhliu in 4397
* Update documentation by echarlaix in 4422
* remove mentions of textual inversion from sdxl. by sayakpaul in 4404
* [LoRA] Fix SDXL text encoder LoRAs by sayakpaul in 4371
* [docs] AutoPipeline tutorial by stevhliu in 4273
* [Pipelines] Add community pipeline for Zero123 by kxhit in 4295
* [Feat] add tiny Autoencoder for (almost) instant decoding by sayakpaul in 4384
* can call encode_prompt with out setting a text encoder instance variable by williamberman in 4396
* Accept pooled_prompt_embeds in the SDXL Controlnet pipeline. Fixes an error if prompt_embeds are passed. by cmdr2 in 4309
* Prevent online access when desired when using download_from_original_stable_diffusion_ckpt by w4ffl35 in 4271
* move tests to nightly by DN6 in 4451
* auto type conversion by isNeil in 4270
* Fix typerror in pipeline handling for MultiControlNets which only contain a single ControlNet by Georgehe4 in 4454
* Add rank argument to train_dreambooth_lora_sdxl.py by levi in 4343
* [docs] Distilled SD by stevhliu in 4442
* Allow controlnets to be loaded (from ckpt) in a parallel thread with a SD model (ckpt), and speed it up slightly by cmdr2 in 4298
* fix typo to ensure `make test-examples` work correctly by statelesshz in 4329
* Fix bug caused by typo by HeliosZhao in 4357
* Delete the duplicate code for the contolnet img 2 img by VV-A-VV in 4411
* Support different strength for Stable Diffusion TensorRT Inpainting pipeline by jinwonkim93 in 4216
* add sdxl to prompt weighting by patrickvonplaten in 4439
* a few fix for kandinsky combined pipeline by yiyixuxu in 4352
* fix-format by yiyixuxu in 4458
* Cleanup Pass on flaky slow tests for Stable Diffusion by DN6 in 4455
* Fixed multi-token textual inversion training by manosplitsis in 4452
* TensorRT Inpaint pipeline: minor fixes by asfiyab-nvidia in 4457
* [Tests] Adds integration tests for SDXL LoRAs by sayakpaul in 4462
* Update README_sdxl.md by patrickvonplaten in 4472
* [SDXL] Allow SDXL LoRA to be run with less than 16GB of VRAM by patrickvonplaten in 4470
* Add a data_dir parameter to the load_dataset method. by AisingioroHao0 in 4482
* [Examples] Support train_text_to_image_lora_sdxl.py by okotaku in 4365
* Log global_step instead of epoch to tensorboard by mrlzla in 4493
* Update lora.md to clarify SDXL support by sayakpaul in 4503
* [SDXL LoRA] fix batch size lora by patrickvonplaten in 4509
* Make sure fp16-fix is used as default by patrickvonplaten in 4510
* grad checkpointing by ethansmith2000 in 4474
* move pipeline only when running validation by patrickvonplaten in 4515
* Moving certain pipelines slow tests to nightly by DN6 in 4469
* add pipeline_class_name argument to Stable Diffusion conversion script by yiyixuxu in 4461
* Fix misc typos by Georgehe4 in 4479
* fix indexing issue in sd reference pipeline by DN6 in 4531
* Copy lora functions to XLPipelines by wooyeolBaek in 4512
* introduce minimalistic reimplementation of SDXL on the SDXL doc by cloneofsimo in 4532
* Fix push_to_hub in train_text_to_image_lora_sdxl.py example by ra100 in 4535
* Update README_sdxl.md to include the free-tier Colab Notebook by sayakpaul in 4540
* Changed code that converts tensors to PIL images in the write_your_own_pipeline notebook by jere357 in 4489
* Move slow tests to nightly by DN6 in 4526
* pin ruff version for quality checks by DN6 in 4539
* [docs] Clean scheduler api by stevhliu in 4204
* Move controlnet load local tests to nightly by DN6 in 4543
* Revert "introduce minimalistic reimplementation of SDXL on the SDXL doc" by patrickvonplaten in 4548
* fix some typo error by VV-A-VV in 4546
* improve controlnet sdxl docs now that we have a good checkpoint. by sayakpaul in 4556
* [Doc] update sdxl-controlnet repo name by yiyixuxu in 4564
* [docs] Expand prompt weighting by stevhliu in 4516
* [docs] Remove attention slicing by stevhliu in 4518
* [docs] Add safetensors flag by stevhliu in 4245
* Convert Stable Diffusion ControlNet to TensorRT by dotieuthien in 4465
* Remove code snippets containing `is_safetensors_available()` by chiral-carbon in 4521
* Fixing repo_id regex validation error on windows platforms by Mystfit in 4358
* [Examples] fix: network_alpha -> network_alphas by sayakpaul in 4572
* [docs] Fix ControlNet SDXL docstring by stevhliu in 4582
* [Utility] adds an image grid utility by sayakpaul in 4576
* Fixed invalid pipeline_class_name parameter. by AisingioroHao0 in 4590
* Fix git-lfs command typo in docs by clairefro in 4586
* [Examples] Update InstructPix2Pix README_sdxl.md to fix mentions by sayakpaul in 4574
* [Pipeline utils] feat: implement push_to_hub for standalone models, schedulers as well as pipelines by sayakpaul in 4128
* An invalid clerical error in sdxl finetune by XDUWQ in 4608
* [Docs] fix links in the controlling generation doc. by sayakpaul in 4612
* add: pushtohubmixin to pipelines and schedulers docs overview. by sayakpaul in 4607
* add: train to text image with sdxl script. by sayakpaul in 4505
* Add GLIGEN implementation by nikhil-masterful in 4441
* Update text2image.md to fix the links by sayakpaul in 4626
* Fix unipc use_karras_sigmas exception - fixes huggingface/diffusers4580 by reimager in 4581
* [research_projects] SDXL controlnet script by patil-suraj in 4633
* [Core] feat: MultiControlNet support for SDXL ControlNet pipeline by sayakpaul in 4597
* [docs] PushToHubMixin by stevhliu in 4622
* [docs] MultiControlNet by stevhliu in 4635
* fix loading custom text encoder when using `from_single_file` by DN6 in 4571
* make things clear in the controlnet sdxl doc. by sayakpaul in 4644
* Fix `UnboundLocalError` during LoRA loading by slessans in 4523
* Support higher dimension LoRAs by isidentical in 4625
* [Safetensors] Make safetensors the default way of saving weights by patrickvonplaten in 4235

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* kxhit
* [Pipelines] Add community pipeline for Zero123 (4295)
* okotaku
* [Examples] Support train_text_to_image_lora_sdxl.py (4365)
* dotieuthien
* Convert Stable Diffusion ControlNet to TensorRT (4465)
* nikhil-masterful
* Add GLIGEN implementation (4441)

0.7183

phrases = ["a birthday cake"]

images = pipe(
prompt=prompt,
gligen_phrases=phrases,
gligen_inpaint_image=input_image,
gligen_boxes=boxes,
gligen_scheduled_sampling_beta=1,
output_type="pil",
num_inference_steps=50,
).images
images[0].save("./gligen-1-4-inpainting-text-box.jpg")


**Grounded generation**

python
import torch
from diffusers import StableDiffusionGLIGENPipeline
from diffusers.utils import load_image

Generate an image described by the prompt and
insert objects described by text at the region defined by bounding boxes
pipe = StableDiffusionGLIGENPipeline.from_pretrained(
"masterful/gligen-1-4-generation-text-box", variant="fp16", torch_dtype=torch.float16
)
pipe = pipe.to("cuda")

prompt = "a waterfall and a modern high speed train running through the tunnel in a beautiful forest with fall foliage"

0.32.2

Fixes for Flux Single File loading, LoRA loading for 4bit BnB Flux, Hunyuan Video

This patch release

- Fixes a regression in loading Comfy UI format single file checkpoints for Flux
- Fixes a regression in loading LoRAs with bitsandbytes 4bit quantized Flux models
- Adds `unload_lora_weights` for Flux Control
- Fixes a bug that prevents Hunyuan Video from running with batch size > 1
- Allow Hunyuan Video to load LoRAs created from the original repository code

All commits
* [Single File] Fix loading Flux Dev finetunes with Comfy Prefix by DN6 in 10545
* [CI] Update HF Token on Fast GPU Model Tests by DN6 10570
* [CI] Update HF Token in Fast GPU Tests by DN6 10568
* Fix batch > 1 in HunyuanVideo by hlky in 10548
* Fix HunyuanVideo produces NaN on PyTorch<2.5 by hlky in 10482
* Fix hunyuan video attention mask dim by a-r-r-o-w in 10454
* [LoRA] Support original format loras for HunyuanVideo by a-r-r-o-w in 10376
* [LoRA] feat: support loading loras into 4bit quantized Flux models. by sayakpaul in 10578
* [LoRA] clean up `load_lora_into_text_encoder()` and `fuse_lora()` copied from by sayakpaul in 10495
* [LoRA] feat: support `unload_lora_weights()` for Flux Control. by sayakpaul in 10206
* Fix Flux multiple Lora loading bug by maxs-kan in 10388
* [LoRA] fix: lora unloading when using expanded Flux LoRAs. by sayakpaul in 10397

Page 2 of 16

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.