Diffusers

Latest version: v0.31.0

Safety actively analyzes 682471 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 16

15.0

prompt = "A birthday cupcake"
images = pipe(
prompt,
guidance_scale=guidance_scale,
num_inference_steps=64,
frame_size=256,
).images

gif_path = export_to_gif(images[0], "cake_3d.gif")


![cake_3d](https://github.com/huggingface/diffusers/assets/22957388/1b84c7c9-8bb9-4c87-ad8e-0c3a886eb860)

Image to 3D

py
import torch
from diffusers import ShapEImg2ImgPipeline
from diffusers.utils import export_to_gif, load_image

ckpt_id = "openai/shap-e-img2img"
pipe = ShapEImg2ImgPipeline.from_pretrained(ckpt_id).to("cuda")

img_url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/shap_e/burger_in.png"
image = load_image(img_url)

generator = torch.Generator(device="cuda").manual_seed(0)
batch_size = 4

7.5

num_images_per_prompt = 1

3.0

images = pipe(
image,
num_images_per_prompt=batch_size,
generator=generator,
guidance_scale=guidance_scale,
num_inference_steps=64,
frame_size =256,
output_type="pil"
).images

gif_path = export_to_gif(images[0], "burger_sampled_3d.gif")


Original image
![image](https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/shap_e/burger_in.png)

Generated
![burger_sampled_3d](https://github.com/huggingface/diffusers/assets/22957388/2dc0b9d5-fbb2-41cb-ab29-afde13f8f8ea)

For more details, check out the [official documentation](https://huggingface.co/docs/diffusers/main/en/api/pipelines/shap_e).

The model was contributed by yiyixuxu in https://github.com/huggingface/diffusers/pull/3742.

Consistency models

Consistency models are diffusion models supporting fast one or few-step image generation. It was proposed by OpenAI in [Consistency Models](https://arxiv.org/abs/2303.01469).

python
import torch

from diffusers import ConsistencyModelPipeline

device = "cuda"
Load the cd_imagenet64_l2 checkpoint.
model_id_or_path = "openai/diffusers-cd_imagenet64_l2"
pipe = ConsistencyModelPipeline.from_pretrained(model_id_or_path, torch_dtype=torch.float16)
pipe.to(device)

Onestep Sampling
image = pipe(num_inference_steps=1).images[0]
image.save("consistency_model_onestep_sample.png")

Onestep sampling, class-conditional image generation
ImageNet-64 class label 145 corresponds to king penguins
image = pipe(num_inference_steps=1, class_labels=145).images[0]
image.save("consistency_model_onestep_sample_penguin.png")

Multistep sampling, class-conditional image generation
Timesteps can be explicitly specified; the particular timesteps below are from the original Github repo.
https://github.com/openai/consistency_models/blob/main/scripts/launch.sh#L77
image = pipe(timesteps=[22, 0], class_labels=145).images[0]
image.save("consistency_model_multistep_sample_penguin.png")


For more details, see the [official docs](https://huggingface.co/docs/diffusers/main/en/api/pipelines/consistency_models).

The model was contributed by our community members dg845 and ayushtues in https://github.com/huggingface/diffusers/pull/3492.

Video-to-Video

Previous video generation pipelines tended to produce watermarks because those watermarks were present in their pretraining dataset. With the latest additions of the following checkpoints, we can now generate watermark-free videos:

* [cerspense/zeroscope_v2_576w](https://huggingface.co/cerspense/zeroscope_v2_576w)
* [cerspense/zeroscope_v2_XL](https://huggingface.co/cerspense/zeroscope_v2_XL)

python
import torch
from diffusers import DiffusionPipeline
from diffusers.utils import export_to_video

pipe = DiffusionPipeline.from_pretrained("cerspense/zeroscope_v2_576w", torch_dtype=torch.float16)
pipe.enable_model_cpu_offload()

memory optimization
pipe.unet.enable_forward_chunking(chunk_size=1, dim=1)
pipe.enable_vae_slicing()

prompt = "Darth Vader surfing a wave"
video_frames = pipe(prompt, num_frames=24).frames
video_path = export_to_video(video_frames)


![darth_vader_waves](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/darthvader_cerpense.gif)

For more details, check out the [official docs](https://huggingface.co/docs/diffusers/main/en/api/pipelines/text_to_video).

It was contributed by patrickvonplaten in https://github.com/huggingface/diffusers/pull/3900.

All commits

* remove seed by yiyixuxu in 3734
* Correct Token to upload docs by patrickvonplaten in 3744
* Correct another push token by patrickvonplaten in 3745
* [Stable Diffusion Inpaint & ControlNet inpaint] Correct timestep inpaint by patrickvonplaten in 3749
* [Documentation] Replace dead link to Flax install guide by JeLuF in 3739
* [documentation] grammatical fixes in installation.mdx by LiamSwayne in 3735
* Text2video zero refinements by 19and99 in 3733
* [Tests] Relax tolerance of flaky failing test by patrickvonplaten in 3755
* [MultiControlNet] Allow save and load by patrickvonplaten in 3747
* Update pipeline_flax_stable_diffusion_controlnet.py by jfozard in 3306
* update conversion script for Kandinsky unet by yiyixuxu in 3766
* [docs] Fix Colab notebook cells by stevhliu in 3777
* [Bug Report template] modify the issue template to include core maintainers. by sayakpaul in 3785
* [Enhance] Update reference by okotaku in 3723
* Fix broken cpu-offloading in legacy inpainting SD pipeline by cmdr2 in 3773
* Fix some bad comment in training scripts by patrickvonplaten in 3798
* Added LoRA loading to `StableDiffusionKDiffusionPipeline` by tripathiarpan20 in 3751
* UnCLIP Image Interpolation -> Keep same initial noise across interpolation steps by Abhinay1997 in 3782
* feat: add PR template. by sayakpaul in 3786
* Ldm3d first PR by estelleafl in 3668
* Complete set_attn_processor for prior and vae by patrickvonplaten in 3796
* fix typo by Isotr0py in 3800
* manual check for checkpoints_total_limit instead of using accelerate by williamberman in 3681
* [train text to image] add note to loading from checkpoint by williamberman in 3806
* device map legacy attention block weight conversion by williamberman in 3804
* [docs] Zero SNR by stevhliu in 3776
* [ldm3d] Fixed small typo by estelleafl in 3820
* [Examples] Improve the model card pushed from the `train_text_to_image.py` script by sayakpaul in 3810
* [Docs] add missing pipelines from the overview pages and minor fixes by sayakpaul in 3795
* [Pipeline] Add new pipeline for ParaDiGMS -- parallel sampling of diffusion models by AndyShih12 in 3716
* Update control_brightness.mdx by dqueue in 3825
* Support ControlNet models with different number of channels in control images by JCBrouwer in 3815
* Add ddpm kandinsky by yiyixuxu in 3783
* [docs] More API stuff by stevhliu in 3835
* relax tol attention conversion test by williamberman in 3842
* fix: random module seeding by sayakpaul in 3846
* fix audio_diffusion tests by teticio in 3850
* Correct bad attn naming by patrickvonplaten in 3797
* [Conversion] Small fixes by patrickvonplaten in 3848
* Fix some audio tests by patrickvonplaten in 3841
* [Docs] add: contributor note in the paradigms docs. by sayakpaul in 3852
* Update Habana Gaudi doc by regisss in 3863
* Add guidance start/stop by holwech in 3770
* feat: rename single-letter vars in `resnet.py` by SauravMaheshkar in 3868
* Fixing the global_step key not found by VincentNeemie in 3844
* Support for manual CLIP loading in StableDiffusionPipeline - txt2img. by WadRex in 3832
* fix sde add noise typo by UranusITS in 3839
* [Tests] add test for checking soft dependencies. by sayakpaul in 3847
* [Enhance] Add LoRA rank args in train_text_to_image_lora by okotaku in 3866
* [docs] Model API by stevhliu in 3562
* fix/docs: Fix the broken doc links by Aisuko in 3897
* Add video img2img by patrickvonplaten in 3900
* fix/doc-code: Updating to the latest version parameters by Aisuko in 3924
* fix/doc: no import torch issue by Aisuko in 3923
* Correct controlnet out of list error by patrickvonplaten in 3928
* Adding better way to define multiple concepts and also validation capabilities. by mauricio-repetto in 3807
* [ldm3d] Update code to be functional with the new checkpoints by estelleafl in 3875
* Improve memory text to video by patrickvonplaten in 3930
* revert automatic chunking by patrickvonplaten in 3934
* avoid upcasting by assigning dtype to noise tensor by prathikr in 3713
* Fix failing np tests by patrickvonplaten in 3942
* Add `timestep_spacing` and `steps_offset` to schedulers by pcuenca in 3947
* Add Consistency Models Pipeline by dg845 in 3492
* Update consistency_models.mdx by sayakpaul in 3961
* Make `UNet2DConditionOutput` pickle-able by prathikr in 3857
* [Consistency Models] correct checkpoint url in the doc by sayakpaul in 3962
* [Text-to-video] Add `torch.compile()` compatibility by sayakpaul in 3949
* [SD-XL] Add new pipelines by patrickvonplaten in 3859
* Kandinsky 2.2 by cene555 in 3903
* Add Shap-E by yiyixuxu in 3742
* disable num attenion heads by patrickvonplaten in 3969
* Improve SD XL by patrickvonplaten in 3968
* fix/doc-code: import torch and fix the broken document address by Aisuko in 3941

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* estelleafl
* Ldm3d first PR (3668)
* [ldm3d] Fixed small typo (3820)
* [ldm3d] Update code to be functional with the new checkpoints (3875)
* AndyShih12
* [Pipeline] Add new pipeline for ParaDiGMS -- parallel sampling of diffusion models (3716)
* dg845
* Add Consistency Models Pipeline (3492)

2.5

PlaygroundAI released a new v2.5 model (`playgroundai/playground-v2.5-1024px-aesthetic`), which particularly excels at aesthetics. The model closely follows the architecture of Stable Diffusion XL, except for a few tweaks. This release comes with support for this model:

python
from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
"playgroundai/playground-v2.5-1024px-aesthetic",
torch_dtype=torch.float16,
variant="fp16",
).to("cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt=prompt, num_inference_steps=50, guidance_scale=3).images[0]
image


Loading from the original single-file checkpoint is also supported:

python
from diffusers import StableDiffusionXLPipeline, EDMDPMSolverMultistepScheduler
import torch

url = "https://huggingface.co/playgroundai/playground-v2.5-1024px-aesthetic/blob/main/playground-v2.5-1024px-aesthetic.safetensors"
pipeline = StableDiffusionXLPipeline.from_single_file(url)
pipeline.to(device="cuda", dtype=torch.float16)

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipeline(prompt=prompt, guidance_scale=3.0).images[0]
image.save("playground_test_image.png")


You can also perform LoRA DreamBooth training with the `playgroundai/playground-v2.5-1024px-aesthetic` checkpoint:

python
accelerate launch train_dreambooth_lora_sdxl.py \
--pretrained_model_name_or_path="playgroundai/playground-v2.5-1024px-aesthetic" \
--instance_data_dir="dog" \
--output_dir="dog-playground-lora" \
--mixed_precision="fp16" \
--instance_prompt="a photo of sks dog" \
--resolution=1024 \
--train_batch_size=1 \
--gradient_accumulation_steps=4 \
--learning_rate=1e-4 \
--use_8bit_adam \
--report_to="wandb" \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--max_train_steps=500 \
--validation_prompt="A photo of sks dog in a bucket" \
--validation_epochs=25 \
--seed="0" \
--push_to_hub


To know more, follow the instructions [here.](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_sdxl.md)

EDM-style training support

EDM refers to the training and sampling techniques introduced in the following paper: [Elucidating the Design Space of Diffusion-Based Generative Models](https://arxiv.org/abs/2206.00364). We have introduced support for training using the EDM formulation in our `train_dreambooth_lora_sdxl.py` script.

To train `stabilityai/stable-diffusion-xl-base-1.0` using the EDM formulation, you just have to specify the `--do_edm_style_training` flag in your training command, and voila 🤗

If you’re interested in extending this formulation to other training scripts, we refer you to [this PR](https://github.com/huggingface/diffusers/pull/7126).

New schedulers with the EDM formulation

To better support the Playground v2.5 model and EDM-style training in general, we are bringing support for `EDMDPMSolverMultistepScheduler` and `EDMEulerScheduler`. These support the EDM formulations of the [`DPMSolverMultistepScheduler`](https://huggingface.co/docs/diffusers/main/en/api/schedulers/multistep_dpm_solver) and [`EulerDiscreteScheduler`](https://huggingface.co/docs/diffusers/main/en/api/schedulers/euler), respectively.

Trajectory Consistency Distillation

Trajectory Consistency Distillation (TCD) enables a model to generate higher quality and more detailed images with fewer steps. Moreover, owing to the effective error mitigation during the distillation process, TCD demonstrates superior performance even under conditions of large inference steps. It was proposed in [Trajectory Consistency Distillation](https://arxiv.org/abs/2402.19159).

This release comes with the support of a `TCDScheduler` that enables this kind of fast sampling. Much like LCM-LoRA, TCD requires an additional adapter for the acceleration. The code snippet below shows a usage:

python
import torch
from diffusers import StableDiffusionXLPipeline, TCDScheduler

device = "cuda"
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"
tcd_lora_id = "h1t/TCD-SDXL-LoRA"

pipe = StableDiffusionXLPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16, variant="fp16").to(device)
pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config)

pipe.load_lora_weights(tcd_lora_id)
pipe.fuse_lora()

prompt = "Painting of the orange cat Otto von Garfield, Count of Bismarck-Schönhausen, Duke of Lauenburg, Minister-President of Prussia. Depicted wearing a Prussian Pickelhaube and eating his favorite meal - lasagna."

image = pipe(
prompt=prompt,
num_inference_steps=4,
guidance_scale=0,
eta=0.3,
generator=torch.Generator(device=device).manual_seed(0),
).images[0]


![tcd_image](https://github.com/jabir-zheng/TCD/raw/main/assets/demo_image.png)

📜 Check out the docs [here](https://huggingface.co/docs/diffusers/main/en/using-diffusers/inference_with_tcd_lora) to know more about TCD.

Many thanks to mhh0318 for contributing the `TCDScheduler` in 7174 and the guide in 7259.

IP-Adapter image embeddings and masking

All the pipelines supporting IP-Adapter accept a `ip_adapter_image_embeds` argument. If you need to run the IP-Adapter multiple times with the same image, you can encode the image once and save the embedding to the disk. This saves computation time and is especially useful when building UIs. Additionally, ComfyUI image embeddings for IP-Adapters are fully compatible in Diffusers and should work out-of-box.

We have also introduced support for providing binary masks to specify which portion of the output image should be assigned to an IP-Adapter. For each input IP-Adapter image, a binary mask and an IP-Adapter must be provided. Thanks to fabiorigano for contributing this feature through 6847.

📜 To know about the exact usage of both of the above, refer to our [official guide](https://huggingface.co/docs/diffusers/main/en/using-diffusers/ip_adapter).

We thank our community members, fabiorigano, asomoza, and cubiq, for their guidance and input on these features.

Guide on merging LoRAs

Merging LoRAs can be a fun and creative way to create new and unique images. Diffusers provides merging support with the `set_adapters` method which concatenates the weights of the LoRAs to merge.

Now, Diffusers also supports the `add_weighted_adapter` method from the PEFT library, unlocking more efficient merging method like TIES, DARE, linear, and even combinations of these merging methods like `dare_ties`.

📜 Take a look at the [Merge LoRAs](https://huggingface.co/docs/diffusers/main/en/using-diffusers/merge_loras) guide to learn more about merging in Diffusers.

LEDITS++

We are adding support to the real image editing technique called LEDITS++: **[Limitless Image Editing using Text-to-Image Models](https://huggingface.co/papers/2311.16711),** a parameter-free method, requiring no fine-tuning nor any optimization.
To edit real images, the LEDITS++ pipelines first invert the image DPM-solver++ scheduler that facilitates editing with as little as **20 total diffusion steps for inversion and inference combined**. LEDITS++ guidance is defined such that it both reflects the direction of the edit (if we want to push away from/towards the edit concept) and the strength of the effect. The guidance also includes a masking term focused on relevant image regions which, for multiple edits especially, ensures that the corresponding guidance terms for each concept remain mostly isolated, limiting interference.

The code snippet below shows a usage:

python
import torch
import PIL
import requests
from io import BytesIO
from diffusers import LEditsPPPipelineStableDiffusionXL, AutoencoderKL

device = "cuda"
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"

vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)

pipe = LEditsPPPipelineStableDiffusionXL.from_pretrained(
base_model_id,
vae=vae,
torch_dtype=torch.float16
).to(device)

def download_image(url):
response = requests.get(url)
return PIL.Image.open(BytesIO(response.content)).convert("RGB")

img_url = "https://www.aiml.informatik.tu-darmstadt.de/people/mbrack/tennis.jpg"
image = download_image(img_url)

_ = pipe.invert(
image = image,
num_inversion_steps=50,
skip=0.2
)

edited_image = pipe(
editing_prompt=["tennis ball","tomato"],
reverse_editing_direction=[True,False],
edit_guidance_scale=[5.0,10.0],
edit_threshold=[0.9,0.85],)


<table>
<tr>
<td><img src="https://github.com/huggingface/diffusers/assets/22957388/9f914400-a4e4-4fc5-a27a-150b3212991f" alt="Tennis ball"></td>
<td><img src="https://github.com/huggingface/diffusers/assets/22957388/be4cb116-17b8-4293-9216-60ab6f3a819d" alt="Tomato ball"></td>
</tr>
</table>

📜 Check out the docs [here](https://huggingface.co/docs/diffusers/main/en/api/pipelines/ledits_pp) to learn more about LEDITS++.

Thanks to manuelbrack for contributing this in 6074.

All commits

* Fix flaky IP Adapter test by DN6 in 6960
* Move SDXL T2I Adapter lora test into PEFT workflow by DN6 in 6965
* Allow passing `config_file` argument to ControlNetModel when using `from_single_file` by DN6 in 6959
* [`PEFT` / `docs`] Add a note about torch.compile by younesbelkada in 6864
* [Core] Harmonize single file ckpt model loading by sayakpaul in 6971
* fix: controlnet inpaint single file. by sayakpaul in 6975
* [docs] IP-Adapter by stevhliu in 6897
* fix IPAdapter unload_ip_adapter test by yiyixuxu in 6972
* [advanced sdxl lora script] - fix 6967 bug when using prior preservation loss by linoytsaban in 6968
* [IP Adapters] feat: allow low_cpu_mem_usage in ip adapter loading by sayakpaul in 6946
* Fix diffusers import prompt2prompt by ihkap11 in 6927
* add: peft to the benchmark workflow by sayakpaul in 6989
* Fix procecss process by co63oc in 6591
* Standardize model card for textual inversion sdxl by Stepheni12 in 6963
* Update textual_inversion.py by Bhavay-2001 in 6952
* [docs] Fix callout by stevhliu in 6998
* [docs] Video generation by stevhliu in 6701
* start depcrecation cycle for lora_attention_proc 👋 by sayakpaul in 7007
* Add documentation for `strength` parameter in `Controlnet_img2img` pipelines by tlpss in 6951
* Fixed typos in dosctrings of __init__() and in forward() of Unet3DConditionModel by MK-2012 in 6663
* [SVD] fix a bug when passing image as tensor by yiyixuxu in 6999
* Fix deprecation warning for torch.utils._pytree._register_pytree_node in PyTorch 2.2 by zyinghua in 7008
* [IP2P] Make text encoder truly optional in InstructPi2Pix by sayakpaul in 6995
* IP-Adapter attention masking by fabiorigano in 6847
* Fix Pixart Slow Tests by DN6 in 6962
* [from_single_file] pass `torch_dtype` to `set_module_tensor_to_device` by yiyixuxu in 6994
* [Refactor] FreeInit for AnimateDiff based pipelines by DN6 in 6874
* [Community Pipelines]Accelerate inference of stable diffusion xl (SDXL) by IPEX on CPU by ustcuna in 6683
* Add section on AnimateLCM to docs by DN6 in 7024
* IP-Adapter support for StableDiffusionXLControlNetInpaintPipeline by rootonchair in 6941
* Supper IP Adapter weight loading in StableDiffusionXLControlNetInpaintPipeline by tontan2545 in 7031
* Fix alt text and image links in AnimateLCM docs by DN6 in 7029
* Update ControlNet Inpaint single file test by DN6 in 7022
* Fix `load_model_dict_into_meta` for ControlNet `from_single_file` by DN6 in 7034
* Remove `disable_full_determinism` from StableVideoDiffusion xformers test. by DN6 in 7039
* update header by pravdomil in 6596
* fix doc example for fom_single_file by yiyixuxu in 7015
* Fix typos in text_to_image examples by standardAI in 7050
* Update checkpoint_merger pipeline to pass the "variant" argument by lstein in 6670
* allow explicit tokenizer & text_encoder in unload_textual_inversion by H3zi in 6977
* re-add unet refactor PR by yiyixuxu in 7044
* IPAdapterTesterMixin by a-r-r-o-w in 6862
* [`Refactor`] `save_model_card` function in `text_to_image` examples by standardAI in 7051
* Fix typos by standardAI in 7068
* Fix docstring of community pipeline imagic by chongdashu in 7062
* Change images to image. The variable images is not used anywhere by bimsarapathiraja in 7074
* fix: TensorRTStableDiffusionPipeline cannot set guidance_scale by caiyueliang in 7065
* [`Refactor`] `StableDiffusionReferencePipeline` inheriting from `DiffusionPipeline` by standardAI in 7071
* Fix truthy-ness condition in pipelines that use denoising_start by a-r-r-o-w in 6912
* Fix head_to_batch_dim for IPAdapterAttnProcessor by fabiorigano in 7077
* [docs] Minor updates by stevhliu in 7063
* Modularize Dreambooth LoRA SD inferencing during and after training by rootonchair in 6654
* Modularize Dreambooth LoRA SDXL inferencing during and after training by rootonchair in 6655
* [Community] Bug fix + Latest IP-Adapter impl. for AnimateDiff img2vid/controlnet by a-r-r-o-w in 7086
* Pass use_linear_projection parameter to mid block in UNetMotionModel by Stepheni12 in 7035
* Resize image before crop by jiqing-feng in 7095
* Small change to download in dance diffusion convert script by DN6 in 7070
* Fix EMA in train_text_to_image_sdxl.py by standardAI in 7048
* Make LoRACompatibleConv padding_mode work. by jinghuan-Chen in 6031
* [Easy] edit issue and PR templates by sayakpaul in 7092
* FIX [`PEFT` / `Core`] Copy the state dict when passing it to `load_lora_weights` by younesbelkada in 7058
* [Core] pass revision in the loading_kwargs. by sayakpaul in 7019
* [Examples] Multiple enhancements to the ControlNet training scripts by sayakpaul in 7096
* move to `uv` in the Dockerfiles. by sayakpaul in 7094
* Add tests to check configs when using single file loading by DN6 in 7099
* denormalize latents with the mean and std if available by patil-suraj in 7111
* [Dockerfile] remove uv from docker jax tpu by sayakpaul in 7115
* Add EDMEulerScheduler by patil-suraj in 7109
* add DPM scheduler with EDM formulation by patil-suraj in 7120
* [`Docs`] Fix typos by standardAI in 7118
* DPMSolverMultistep add `rescale_betas_zero_snr` by Beinsezii in 7097
* [Tests] make test steps dependent on certain things and general cleanup of the workflows by sayakpaul in 7026
* fix kwarg in the SDXL LoRA DreamBooth by sayakpaul in 7124
* [Diffusers CI] Switch slow test runners by DN6 in 7123
* [stalebot] don't close the issue if the stale label is removed by yiyixuxu in 7106
* refactor: move model helper function in pipeline to a mixin class by ultranity in 6571
* [docs] unet type hints by a-r-r-o-w in 7134
* use uv for installing stuff in the workflows. by sayakpaul in 7116
* limit documentation workflow runs for relevant changes. by sayakpaul in 7125
* add: support for notifying the maintainers about the docker ci status. by sayakpaul in 7113
* Fix setting fp16 dtype in AnimateDiff convert script. by DN6 in 7127
* [`Docs`] Fix typos by standardAI in 7131
* [ip-adapter] refactor `prepare_ip_adapter_image_embeds` and skip load `image_encoder` by yiyixuxu in 7016
* [CI] fix path filtering in the documentation workflows by sayakpaul in 7153
* [Urgent][Docker CI] pin `uv` version for now and a minor change in the Slack notification by sayakpaul in 7155
* Fix LCM benchmark test by sayakpaul in 7158
* [CI] Remove max parallel flag on slow test runners by DN6 in 7162
* Fix vae_encodings_fn hash in train_text_to_image_sdxl.py by lhoestq in 7171
* fix: loading problem for sdxl lora dreambooth by sayakpaul in 7166
* Map speedup by kopyl in 6745
* [stalebot] fix a bug by yiyixuxu in 7156
* Support EDM-style training in DreamBooth LoRA SDXL script by sayakpaul in 7126
* Fix PixArt 256px inference by lawrence-cj in 6789
* [ip-adapter] fix problem using embeds with the plus version of ip adapters by asomoza in 7189
* feat: add ip adapter benchmark by sayakpaul in 6936
* [Docs] more elaborate example for peft `torch.compile` by sayakpaul in 7161
* adding `callback_on_step_end` for `StableDiffusionLDM3DPipeline` by rootonchair in 7149
* Update requirements.txt to remove huggingface-cli by sayakpaul in 7202
* [advanced dreambooth lora sdxl] add DoRA training feature by linoytsaban in 7072
* FIx torch and cuda version in ONNX tests by DN6 in 7164
* [training scripts] add tags of diffusers-training by linoytsaban in 7206
* fix a bug in `from_config` by yiyixuxu in 7192
* Fix: UNet2DModel::__init__ type hints; fixes issue 4806 by fpgaminer in 7175
* Fix typos by standardAI in 7181
* Enable PyTorch's FakeTensorMode for EulerDiscreteScheduler scheduler by thiagocrepaldi in 7151
* [docs] Improve SVD pipeline docs by a-r-r-o-w in 7087
* [Docs] Update callback.md code example by rootonchair in 7150
* [Core] errors should be caught as soon as possible. by sayakpaul in 7203
* [Community] PromptDiffusion Pipeline by iczaw in 6752
* add TCD Scheduler by mhh0318 in 7174
* SDXL Turbo support and example launch by bram-w in 6473
* [bug] Fix float/int guidance scale not working in `StableVideoDiffusionPipeline` by JinayJain in 7143
* [Pipiline] Wuerstchen v3 aka Stable Cascasde pipeline by kashif in 6487
* Update train_dreambooth_lora_sdxl_advanced.py by landmann in 7227
* [Core] move out the utilities from pipeline_utils.py by sayakpaul in 7234
* Refactor Prompt2Prompt: Inherit from DiffusionPipeline by ihkap11 in 7211
* add DoRA training feature to sdxl dreambooth lora script by linoytsaban in 7235
* fix: remove duplicated code in TemporalBasicTransformerBlock. by AsakusaRinne in 7212
* [Examples] fix: prior preservation setting in DreamBooth LoRA SDXL script. by sayakpaul in 7242
* fix: support for loading playground v2.5 single file checkpoint. by sayakpaul in 7230
* Raise an error when trying to use SD Cascade Decoder with dtype bfloat16 and torch < 2.2 by DN6 in 7244
* Remove the line. Using it create wrong output by bimsarapathiraja in 7075
* [docs] Merge LoRAs by stevhliu in 7213
* use self.device by pravdomil in 6595
* [docs] Community tips by stevhliu in 7137
* [Core] throw error when patch inputs and layernorm are provided for Transformers2D by sayakpaul in 7200
* [Tests] fix: VAE tiling tests when setting the right device by sayakpaul in 7246
* [Utils] Improve " Copied from ..." statements in the pipelines by sayakpaul in 6917
* [Easy] fix: save_model_card utility of the DreamBooth SDXL LoRA script by sayakpaul in 7258
* Make mid block optional for flax UNet by mar-muel in 7083
* Solve missing clip_sample implementation in FlaxDDIMScheduler. by hi-sushanta in 7017
* [Tests] fix config checking tests by sayakpaul in 7247
* [docs] IP-Adapter image embedding by stevhliu in 7226
* Adds `denoising_end` parameter to ControlNetPipeline for SDXL by UmerHA in 6175
* Add npu support by MengqingCao in 7144
* [Community Pipeline] Skip Marigold `depth_colored` with `color_map=None` by qqii in 7170
* update the signature of from_single_file by yiyixuxu in 7216
* [UNet_Spatio_Temporal_Condition] fix default num_attention_heads in unet_spatio_temporal_condition by Wang-Xiaodong1899 in 7205
* [docs/nits] Fix return values based on `return_dict` and minor doc updates by a-r-r-o-w in 7105
* [Chore] remove tf mention by sayakpaul in 7245
* Fix gmflow_dir by pravdomil in 6583
* Support latents_mean and latents_std by haofanwang in 7132
* Inline InputPadder by pravdomil in 6582
* [Dockerfiles] add: a workflow to check if docker containers can be built in case of modifications by sayakpaul in 7129
* instruct pix2pix pipeline: remove sigma scaling when computing classifier free guidance by erliding in 7006
* Change `export_to_video` default by DN6 in 6990
* [Chore] switch to `logger.warning` by sayakpaul in 7289
* [LoRA] use the PyTorch classes wherever needed and start depcrecation cycles by sayakpaul in 7204
* Add single file support for Stable Cascade by DN6 in 7274
* Fix passing pooled prompt embeds to Cascade Decoder and Combined Pipeline by DN6 in 7287
* Fix loading Img2Img refiner components in `from_single_file` by DN6 in 7282
* [Chore] clean residue from copy-pasting in the UNet single file loader by sayakpaul in 7295
* Update Cascade documentation by DN6 in 7257
* Update Stable Cascade Conversion Scripts by DN6 in 7271
* [Pipeline] Add LEDITS++ pipelines by manuelbrack in 6074
* [PyPI publishing] feat: automate the process of pypi publication to some extent. by sayakpaul in 7270
* add: support for notifying maintainers about the nightly test status by sayakpaul in 7117
* Fix Wrong Text-encoder Grad Setting in Custom_Diffusion Training by Rbrq03 in 7302
* Add Intro page of TCD by mhh0318 in 7259
* Fix typos in `UNet2DConditionModel` documentation by alexanderbonnet in 7291
* Change step_offset scheduler docstrings by Beinsezii in 7128
* update get_order_list if statement by kghamilton89 in 7309
* add: pytest log installation by sayakpaul in 7313
* [Tests] Fix incorrect constant in VAE scaling test. by DN6 in 7301
* log loss per image by noskill in 7278
* add edm schedulers in doc by patil-suraj in 7319
* [Advanced DreamBooth LoRA SDXL] Support EDM-style training (follow up of 7126) by linoytsaban in 7182
* Update Cascade Tests by DN6 in 7324

2.1

Kandinsky 2.1 inherits best practices from DALL-E 2 and Latent Diffusion while introducing some new ideas.

Installation
bash
pip install diffusers transformers accelerate

Code example
python
from diffusers import DiffusionPipeline
import torch

pipe_prior = DiffusionPipeline.from_pretrained("kandinsky-community/kandinsky-2-1-prior", torch_dtype=torch.float16)
pipe_prior.to("cuda")

t2i_pipe = DiffusionPipeline.from_pretrained("kandinsky-community/kandinsky-2-1", torch_dtype=torch.float16)
t2i_pipe.to("cuda")

prompt = "A alien cheeseburger creature eating itself, claymation, cinematic, moody lighting"
negative_prompt = "low quality, bad quality"

generator = torch.Generator(device="cuda").manual_seed(12)
image_embeds, negative_image_embeds = pipe_prior(prompt, negative_prompt, guidance_scale=1.0, generator=generator).to_tuple()

image = t2i_pipe(prompt, negative_prompt=negative_prompt, image_embeds=image_embeds, negative_image_embeds=negative_image_embeds).images[0]
image.save("cheeseburger_monster.png")



![img](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/kandinsky-docs/cheeseburger.png)

To learn more about the Kandinsky pipelines, and more details about speed and memory optimizations, please have a look at the [docs](https://huggingface.co/docs/diffusers/main/en/api/pipelines/kandinsky).

Thanks ayushtues, for helping with the integration of Kandinsky 2.1!

UniDiffuser

UniDiffuser introduces a multimodal diffusion process that is capable of handling different generation tasks using a single unified approach:

* Unconditional image and text generation
* Joint image-text generation
* Text-to-image generation
* Image-to-text generation
* Image variation
* Text variation

Below is an example of how to use UniDiffuser for text-to-image generation:

python
import torch
from diffusers import UniDiffuserPipeline

model_id_or_path = "thu-ml/unidiffuser-v1"
pipe = UniDiffuserPipeline.from_pretrained(model_id_or_path, torch_dtype=torch.float16)
pipe.to("cuda")

This mode can be inferred from the input provided to the `pipe`.
pipe.set_text_to_image_mode()

prompt = "an elephant under the sea"
sample = pipe(prompt=prompt, num_inference_steps=20, guidance_scale=8.0).images[0]
sample.save("elephant.png")


Check out the UniDiffuser [docs](https://huggingface.co/docs/diffusers/main/en/api/pipelines/unidiffuser) to know more.

![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/unidiffuser.gif)

UniDiffuser was added by dg845 in [this PR](https://github.com/huggingface/diffusers/pull/2963).

LoRA

We're happy to support the A1111 formatted CivitAI LoRA checkpoints in a limited capacity.

First, download a checkpoint. We’ll use [this one](https://civitai.com/models/13239/light-and-shadow) for demonstration purposes.

bash
wget https://civitai.com/api/download/models/15603 -O light_and_shadow.safetensors


Next, we initialize a `DiffusionPipeline`:

python
import torch

from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler

pipeline = StableDiffusionPipeline.from_pretrained(
"gsdf/Counterfeit-V2.5", torch_dtype=torch.float16, safety_checker=None
).to("cuda")
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(
pipeline.scheduler.config, use_karras_sigmas=True
)


We then load the checkpoint downloaded from CivitAI:

python
pipeline.load_lora_weights(".", weight_name="light_and_shadow.safetensors")


(If you’re loading a checkpoint in the `safetensors` format, please ensure you have `safetensors` installed.)

And then it’s time for running inference:

python
prompt = "masterpiece, best quality, 1girl, at dusk"
negative_prompt = ("(low quality, worst quality:1.4), (bad anatomy), (inaccurate limb:1.2), "
"bad composition, inaccurate eyes, extra digit, fewer digits, (extra arms:1.2), large breasts")

images = pipeline(prompt=prompt,
negative_prompt=negative_prompt,
width=512,
height=768,
num_inference_steps=15,
num_images_per_prompt=4,
generator=torch.manual_seed(0)
).images


Below is a comparison between the LoRA and the non-LoRA results:

![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/lora_non_lora_comparison.png)

Check out [the docs](https://huggingface.co/docs/diffusers/main/en/training/lora#supporting-a1111-themed-lora-checkpoints-from-diffusers) to learn more.

Thanks to takuma104 for contributing this feature via [this PR](https://github.com/huggingface/diffusers/pull/3437).

Torch 2.0 Compile Speed-up

We introduced Torch 2.0 support for computing attention efficiently in 0.13.0. Since then, we have made a number of improvements to ensure the number of "graph breaks" in our models is reduced so that the models can be compiled with `torch.compile()`. As a result, we are happy to report massive improvements in the inference speed of our most popular pipelines. Check out [this doc](https://huggingface.co/docs/diffusers/main/en/optimization/torch2.0) to know more.

Thanks to Chillee for helping us with this. Thanks to patrickvonplaten for fixing the problems stemming from "graph breaks" in [this PR](https://github.com/huggingface/diffusers/pull/3286).

VAE pre-processing

We added a Vae Image processor class that provides a unified API for pipelines to prepare their image inputs, as well as post-processing their outputs. It supports resizing, normalization, and conversion between PIL Image, PyTorch, and Numpy arrays.

With that, all Stable diffusion pipelines now accept image inputs in the format of Pytorch Tensor and Numpy array, in addition to PIL Image, and can produce outputs in these 3 formats. It will also accept and return latents. This means you can now take generated latents from one pipeline and pass them to another as inputs, without leaving the latent space. If you work with multiple pipelines, you can pass Pytorch Tensor between them without converting to PIL Image.

To learn more about the API, check out our doc [here](https://huggingface.co/docs/diffusers/main/en/api/image_processor)

ControlNet Img2Img & Inpainting

ControlNet is one of the most used diffusion models and upon strong demand from the community we added controlnet img2img and controlnet inpaint pipelines.
This allows to use any controlnet checkpoint for both image-2-image setting as well as for inpaint.

:point_right: **Inpaint**: See controlnet inpaint model [here](https://huggingface.co/lllyasviel/control_v11p_sd15_inpaint)
:point_right: **Image-to-Image**: Any controlnet checkpoint can be used for image to image, e.g.:
py
from diffusers import StableDiffusionControlNetImg2ImgPipeline, ControlNetModel, UniPCMultistepScheduler
from diffusers.utils import load_image
import numpy as np
import torch

import cv2
from PIL import Image

download an image
image = load_image(
"https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png"
)
np_image = np.array(image)

get canny image
np_image = cv2.Canny(np_image, 100, 200)
np_image = np_image[:, :, None]
np_image = np.concatenate([np_image, np_image, np_image], axis=2)
canny_image = Image.fromarray(np_image)

load control net and stable diffusion v1-5
controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16)
pipe = StableDiffusionControlNetImg2ImgPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16
)

speed up diffusion process with faster scheduler and memory optimization
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()

generate image
generator = torch.manual_seed(0)
image = pipe(
"futuristic-looking woman",
num_inference_steps=20,
generator=generator,
image=image,
control_image=canny_image,
).images[0]


Diffedit Zero-Shot Inpainting Pipeline

This pipeline (introduced in [DiffEdit: Diffusion-based semantic image editing with mask guidance](https://arxiv.org/abs/2210.11427)) allows for image editing with natural language. Below is an end-to-end example.

First, let’s load our pipeline:

python
import torch
from diffusers import DDIMScheduler, DDIMInverseScheduler, StableDiffusionDiffEditPipeline

sd_model_ckpt = "stabilityai/stable-diffusion-2-1"
pipeline = StableDiffusionDiffEditPipeline.from_pretrained(
sd_model_ckpt,
torch_dtype=torch.float16,
safety_checker=None,
)
pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config)
pipeline.inverse_scheduler = DDIMInverseScheduler.from_config(pipeline.scheduler.config)
pipeline.enable_model_cpu_offload()
pipeline.enable_vae_slicing()
generator = torch.manual_seed(0)


Then, we load an input image to edit using our method:

python
from diffusers.utils import load_image

img_url = "https://github.com/Xiang-cd/DiffEdit-stable-diffusion/raw/main/assets/origin.png"
raw_image = load_image(img_url).convert("RGB").resize((768, 768))


Then, we employ the source and target prompts to generate the editing mask:

python
source_prompt = "a bowl of fruits"
target_prompt = "a basket of fruits"
mask_image = pipeline.generate_mask(
image=raw_image,
source_prompt=source_prompt,
target_prompt=target_prompt,
generator=generator,
)


Then, we employ the caption and the input image to get the inverted latents:

python
inv_latents = pipeline.invert(prompt=source_prompt, image=raw_image, generator=generator).latents


Now, generate the image with the inverted latents and semantically generated mask:

python
image = pipeline(
prompt=target_prompt,
mask_image=mask_image,
image_latents=inv_latents,
generator=generator,
negative_prompt=source_prompt,
).images[0]
image.save("edited_image.png")


![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/diffedit.png)

Check out the [docs](https://huggingface.co/docs/diffusers/main/en/api/pipelines/diffedit) to learn more about this pipeline.

Thanks to clarencechen for contributing this pipeline in [this PR](https://github.com/huggingface/diffusers/pull/2837).

Docs

* [Distributed inference with multiple GPUs](https://huggingface.co/docs/diffusers/main/en/training/distributed_inference) ([PR](https://github.com/huggingface/diffusers/issues/3010))
* [Attention processor](https://github.com/huggingface/diffusers/issues/3010) ([PR](https://github.com/huggingface/diffusers/pull/3474))
* [Load different Stable Diffusion formats](https://huggingface.co/docs/diffusers/main/en/using-diffusers/other-formats) ([PR](https://github.com/huggingface/diffusers/pull/3534))

Apart from these, we have made multiple improvements to the overall quality-of-life of our docs.

Thanks to stevhliu for leading the charge here.

Misc

* xformers attention processor fix when using LoRA ([PR](https://github.com/huggingface/diffusers/pull/3556) by takuma104)
* Pytorch 2.0 SDPA implementation of the LoRA attention processor ([PR](https://github.com/huggingface/diffusers/pull/3594))

All commits

* Post release for 0.16.0 by patrickvonplaten in 3244
* [docs] only mention one stage by pcuenca in 3246
* Write model card in controlnet training script by pcuenca in 3229
* [2064]: Add stochastic sampler (sample_dpmpp_sde) by nipunjindal in 3020
* [Stochastic Sampler][Slow Test]: Cuda test fixes by nipunjindal in 3257
* Remove required from tracker_project_name by pcuenca in 3260
* adding required parameters while calling the get_up_block and get_down_block by init-22 in 3210
* [docs] Update interface in repaint.mdx by ernestchu in 3119
* Update IF name to XL by apolinario in 3262
* fix typo in score sde pipeline by fecet in 3132
* Fix typo in textual inversion JAX training script by jairtrejo in 3123
* AudioDiffusionPipeline - fix encode method after config changes by teticio in 3114
* Revert "Revert "[Community Pipelines] Update lpw_stable_diffusion pipeline"" by patrickvonplaten in 3265
* Fix community pipelines by patrickvonplaten in 3266
* update notebook by yiyixuxu in 3259
* [docs] add notes for stateful model changes by williamberman in 3252
* [LoRA] quality of life improvements in the loading semantics and docs by sayakpaul in 3180
* [Community Pipelines] EDICT pipeline implementation by Joqsan in 3153
* [Docs]zh translated docs update by DrDavidS in 3245
* Update logging.mdx by standardAI in 2863
* Add multiple conditions to StableDiffusionControlNetInpaintPipeline by timegate in 3125
* Let's make sure that dreambooth always uploads to the Hub by patrickvonplaten in 3272
* Diffedit Zero-Shot Inpainting Pipeline by clarencechen in 2837
* add constant learning rate with custom rule by jason9075 in 3133
* Allow disabling torch 2_0 attention by patrickvonplaten in 3273
* [doc] add link to training script by yiyixuxu in 3271
* temp disable spectogram diffusion tests by williamberman in 3278
* Changed sample[0] to images[0] by IliaLarchenko in 3304
* Typo in tutorial by IliaLarchenko in 3295
* Torch compile graph fix by patrickvonplaten in 3286
* Postprocessing refactor img2img by yiyixuxu in 3268
* [Torch 2.0 compile] Fix more torch compile breaks by patrickvonplaten in 3313
* fix: scale_lr and sync example readme and docs. by sayakpaul in 3299
* Update stable_diffusion.mdx by mu94-csl in 3310
* Fix missing variable assign in DeepFloyd-IF-II by gitmylo in 3315
* Correct doc build for patch releases by patrickvonplaten in 3316
* Add Stable Diffusion RePaint to community pipelines by Markus-Pobitzer in 3320
* Fix multistep dpmsolver for cosine schedule (suitable for deepfloyd-if) by LuChengTHU in 3314
* [docs] Improve LoRA docs by stevhliu in 3311
* Added input pretubation by isamu-isozaki in 3292
* Update write_own_pipeline.mdx by csaybar in 3323
* update controlling generation doc with latest goodies. by sayakpaul in 3321
* [Quality] Make style by patrickvonplaten in 3341
* Fix config dpm by patrickvonplaten in 3343
* Add the SDE variant of DPM-Solver and DPM-Solver++ by LuChengTHU in 3344
* Add upsample_size to AttnUpBlock2D, AttnDownBlock2D by will-rice in 3275
* Rename --only_save_embeds to --save_as_full_pipeline by arrufat in 3206
* [AudioLDM] Generalise conversion script by sanchit-gandhi in 3328
* Fix TypeError when using prompt_embeds and negative_prompt by At-sushi in 2982
* Fix pipeline class on README by themrzmaster in 3345
* Inpainting: typo in docs by LysandreJik in 3331
* Add `use_Karras_sigmas` to LMSDiscreteScheduler by Isotr0py in 3351
* Batched load of textual inversions by pdoane in 3277
* [docs] Fix docstring by stevhliu in 3334
* if dreambooth lora by williamberman in 3360
* Postprocessing refactor all others by yiyixuxu in 3337
* [docs] Improve safetensors docstring by stevhliu in 3368
* add: a warning message when using xformers in a PT 2.0 env. by sayakpaul in 3365
* StableDiffusionInpaintingPipeline - resize image w.r.t height and width by rupertmenneer in 3322
* [docs] Adapt a model by stevhliu in 3326
* [docs] Load safetensors by stevhliu in 3333
* [Docs] Fix stable_diffusion.mdx typo by sudowind in 3398
* Support ControlNet v1.1 shuffle properly by takuma104 in 3340
* [Tests] better determinism by sayakpaul in 3374
* [docs] Add transformers to install by stevhliu in 3388
* [deepspeed] partial ZeRO-3 support by stas00 in 3076
* Add omegaconf for tests by patrickvonplaten in 3400
* Fix various bugs with LoRA Dreambooth and Dreambooth script by patrickvonplaten in 3353
* Fix docker file by patrickvonplaten in 3402
* fix: deepseepd_plugin retrieval from accelerate state by sayakpaul in 3410
* [Docs] Add `sigmoid` beta_scheduler to docstrings of relevant Schedulers by Laurent2916 in 3399
* Don't install accelerate and transformers from source by patrickvonplaten in 3415
* Don't install transformers and accelerate from source by patrickvonplaten in 3414
* Improve fast tests by patrickvonplaten in 3416
* attention refactor: the trilogy by williamberman in 3387
* [Docs] update the PT 2.0 optimization doc with latest findings by sayakpaul in 3370
* Fix style rendering by pcuenca in 3433
* unCLIP scheduler do not use note by williamberman in 3417
* Replace deprecated command with environment file by jongwooo in 3409
* fix warning message pipeline loading by patrickvonplaten in 3446
* add stable diffusion tensorrt img2img pipeline by asfiyab-nvidia in 3419
* Refactor controlnet and add img2img and inpaint by patrickvonplaten in 3386
* [Scheduler] DPM-Solver (++) Inverse Scheduler by clarencechen in 3335
* [Docs] Fix incomplete docstring for resnet.py by Laurent2916 in 3438
* fix tiled vae blend extent range by superlabs-dev in 3384
* Small update to "Next steps" section by pcuenca in 3443
* Allow arbitrary aspect ratio in IFSuperResolutionPipeline by devxpy in 3298
* Adding 'strength' parameter to StableDiffusionInpaintingPipeline by rupertmenneer in 3424
* [WIP] Bugfix - Pipeline.from_pretrained is broken when the pipeline is partially downloaded by vimarshc in 3448
* Fix gradient checkpointing bugs in freezing part of models (requires_grad=False) by 7eu7d7 in 3404
* Make dreambooth lora more robust to orig unet by patrickvonplaten in 3462
* Reduce peak VRAM by releasing large attention tensors (as soon as they're unnecessary) by cmdr2 in 3463
* Add min snr to text2img lora training script by wfng92 in 3459
* Add inpaint lora scale support by Glaceon-Hyy in 3460
* [From ckpt] Fix from_ckpt by patrickvonplaten in 3466
* Update full dreambooth script to work with IF by williamberman in 3425
* Add IF dreambooth docs by williamberman in 3470
* parameterize pass single args through tuple by williamberman in 3477
* attend and excite tests disable determinism on the class level by williamberman in 3478
* dreambooth docs torch.compile note by williamberman in 3471
* add: if entry in the dreambooth training docs. by sayakpaul in 3472
* [docs] Textual inversion inference by stevhliu in 3473
* [docs] Distributed inference by stevhliu in 3376
* [{Up,Down}sample1d] explicit view kernel size as number elements in flattened indices by williamberman in 3479
* mps & onnx tests rework by pcuenca in 3449
* [Attention processor] Better warning message when shifting to `AttnProcessor2_0` by sayakpaul in 3457
* [Docs] add note on local directory path. by sayakpaul in 3397
* Refactor full determinism by patrickvonplaten in 3485
* Fix DPM single by patrickvonplaten in 3413
* Add `use_Karras_sigmas` to DPMSolverSinglestepScheduler by Isotr0py in 3476
* Adds local_files_only bool to prevent forced online connection by w4ffl35 in 3486
* [Docs] Korean translation (optimization, training) by Snailpong in 3488
* DataLoader respecting EXIF data in Training Images by Ambrosiussen in 3465
* feat: allow disk offload for diffuser models by hari10599 in 3285
* [Community] reference only control by okotaku in 3435
* Support for cross-attention bias / mask by Birch-san in 2634
* do not scale the initial global step by gradient accumulation steps when loading from checkpoint by williamberman in 3506
* Fix bug in panorama pipeline when using dpmsolver scheduler by Isotr0py in 3499
* [Community Pipelines]Accelerate inference of stable diffusion by IPEX on CPU by yingjie-han in 3105
* [Community] ControlNet Reference by okotaku in 3508
* Allow custom pipeline loading by patrickvonplaten in 3504
* Make sure Diffusers works even if Hub is down by patrickvonplaten in 3447
* Improve README by patrickvonplaten in 3524
* Update README.md by patrickvonplaten in 3525
* Run `torch.compile` tests in separate subprocesses by pcuenca in 3503
* fix attention mask pad check by williamberman in 3531
* explicit broadcasts for assignments by williamberman in 3535
* [Examples/DreamBooth] refactor save_model_card utility in dreambooth examples by sayakpaul in 3543
* Fix panorama to support all schedulers by Isotr0py in 3546
* Add open parti prompts to docs by patrickvonplaten in 3549
* Add Kandinsky 2.1 by yiyixuxu ayushtues in 3308
* fix broken change for vq pipeline by yiyixuxu in 3563
* [Stable Diffusion Inpainting] Allow standard text-to-img checkpoints to be useable for SD inpainting by patrickvonplaten in 3533
* Fix loaded_token reference before definition by eminn in 3523
* renamed variable to input_ and output_ by vikasmech in 3507
* Correct inpainting controlnet docs by patrickvonplaten in 3572
* Fix controlnet guess mode euler by patrickvonplaten in 3571
* [docs] Add AttnProcessor to docs by stevhliu in 3474
* [WIP] Add UniDiffuser model and pipeline by dg845 in 2963
* Fix to apply LoRAXFormersAttnProcessor instead of LoRAAttnProcessor when xFormers is enabled by takuma104 in 3556
* fix dreambooth attention mask by linbo0518 in 3541
* [IF super res] correctly normalize PIL input by williamberman in 3536
* [docs] Maintenance by stevhliu in 3552
* [docs] update the broken links by brandonJY in 3568
* [docs] Working with different formats by stevhliu in 3534
* remove print statements from attention processor. by sayakpaul in 3592
* Fix temb attention by patrickvonplaten in 3607
* [docs] update the broken links by kadirnar in 3577
* [UniDiffuser Tests] Fix some tests by sayakpaul in 3609
* 3487 Fix inpainting strength for various samplers by rupertmenneer in 3532
* [Community] Support StableDiffusionTilingPipeline by kadirnar in 3586
* [Community, Enhancement] Add reference tricks in README by okotaku in 3589
* [Feat] Enable State Dict For Textual Inversion Loader by ghunkins in 3439
* [Community] CLIP Guided Images Mixing with Stable DIffusion Pipeline by TheDenk in 3587
* fix tests by patrickvonplaten in 3614
* Make sure we also change the config when setting `encoder_hid_dim_type=="text_proj"` and allow xformers by patrickvonplaten in 3615
* goodbye frog by williamberman in 3617
* update code to reflect latest changes as of May 30th by prathikr in 3616
* update dreambooth lora to work with IF stage II by williamberman in 3560
* Full Dreambooth IF stage II upscaling by williamberman in 3561
* [Docs] include the instruction-tuning blog link in the InstructPix2Pix docs by sayakpaul in 3644
* [Kandinsky] Improve kandinsky API a bit by patrickvonplaten in 3636
* Support Kohya-ss style LoRA file format (in a limited capacity) by takuma104 in 3437
* Iterate over unique tokens to avoid duplicate replacements for multivector embeddings by lachlan-nicholson in 3588
* fixed typo in example train_text_to_image.py by kashif in 3608
* fix inpainting pipeline when providing initial latents by yiyixuxu in 3641
* [Community Doc] Updated the filename and readme file. by kadirnar in 3634
* add Stable Diffusion TensorRT Inpainting pipeline by asfiyab-nvidia in 3642
* set config from original module but set compiled module on class by williamberman in 3650
* dreambooth if docs - stage II, more info by williamberman in 3628
* linting fix by williamberman in 3653
* Set step_rules correctly for piecewise_constant scheduler by 0x1355 in 3605
* Allow setting num_cycles for cosine_with_restarts lr scheduler by 0x1355 in 3606
* [docs] Load A1111 LoRA by stevhliu in 3629
* dreambooth upscaling fix added latents by williamberman in 3659
* Correct multi gpu dreambooth by patrickvonplaten in 3673
* Fix from_ckpt not working properly on windows by LyubimovVladislav in 3666
* Update Compel documentation for textual inversions by pdoane in 3663
* [UniDiffuser test] fix one test so that it runs correctly on V100 by sayakpaul in 3675
* [docs] More API fixes by stevhliu in 3640
* [WIP]Vae preprocessor refactor (PR1) by yiyixuxu in 3557
* small tweaks for parsing thibaudz controlnet checkpoints by williamberman in 3657
* move activation dispatches into helper function by williamberman in 3656
* [docs] Fix link to loader method by stevhliu in 3680
* Add function to remove monkey-patch for text encoder LoRA by takuma104 in 3649
* [LoRA] feat: add lora attention processor for pt 2.0. by sayakpaul in 3594
* refactor Image processor for x4 upscaler by yiyixuxu in 3692
* feat: when using PT 2.0 use LoRAAttnProcessor2_0 for text enc LoRA. by sayakpaul in 3691
* Fix the Kandinsky docstring examples by freespirit in 3695
* Support views batch for panorama by Isotr0py in 3632
* Fix from_ckpt for Stable Diffusion 2.x by ctrysbita in 3662
* Add draft for lora text encoder scale by patrickvonplaten in 3626

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* nipunjindal
* [2064]: Add stochastic sampler (sample_dpmpp_sde) (3020)
* [Stochastic Sampler][Slow Test]: Cuda test fixes (3257)
* clarencechen
* Diffedit Zero-Shot Inpainting Pipeline (2837)
* [Scheduler] DPM-Solver (++) Inverse Scheduler (3335)
* Markus-Pobitzer
* Add Stable Diffusion RePaint to community pipelines (3320)
* takuma104
* Support ControlNet v1.1 shuffle properly (3340)
* Fix to apply LoRAXFormersAttnProcessor instead of LoRAAttnProcessor when xFormers is enabled (3556)
* Support Kohya-ss style LoRA file format (in a limited capacity) (3437)
* Add function to remove monkey-patch for text encoder LoRA (3649)
* asfiyab-nvidia
* add stable diffusion tensorrt img2img pipeline (3419)
* add Stable Diffusion TensorRT Inpainting pipeline (3642)
* Snailpong
* [Docs] Korean translation (optimization, training) (3488)
* okotaku
* [Community] reference only control (3435)
* [Community] ControlNet Reference (3508)
* [Community, Enhancement] Add reference tricks in README (3589)
* Birch-san
* Support for cross-attention bias / mask (2634)
* yingjie-han
* [Community Pipelines]Accelerate inference of stable diffusion by IPEX on CPU (3105)
* dg845
* [WIP] Add UniDiffuser model and pipeline (2963)
* kadirnar
* [docs] update the broken links (3577)
* [Community] Support StableDiffusionTilingPipeline (3586)
* [Community Doc] Updated the filename and readme file. (3634)
* TheDenk
* [Community] CLIP Guided Images Mixing with Stable DIffusion Pipeline (3587)
* prathikr
* update code to reflect latest changes as of May 30th (3616)

2.0

Speaking of memory-efficient attention, Accelerated PyTorch 2.0 Transformers now comes with built-in native support for it! When PyTorch 2.0 is released you'll no longer have to install `xFormers` or any third-party package to take advantage of it. In `diffusers` we are already preparing for that, and it works out of the box. So, if you happen to be using the latest "nightlies" of PyTorch 2.0 beta, then you're all set – diffusers will use Accelerated PyTorch 2.0 Transformers by default.

In our tests, the built-in PyTorch 2.0 implementation is usually as fast as xFormers', and sometimes even faster. Performance depends on the card you are using and whether you run your code in `float16` or `float32`, so check [our documentation](https://huggingface.co/docs/diffusers/v0.13.0/en/optimization/torch2.0) for details.

Coarse-grained CPU offload

Community member keturn, with whom we have enjoyed thoughtful software design conversations, called our attention to the fact that enabling sequential cpu offloading via [`enable_sequential_cpu_offload`](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/text2img#diffusers.StableDiffusionPipeline.enable_sequential_cpu_offload) worked great to save a lot of memory, but made inference much slower.

This is because `enable_sequential_cpu_offload()` is optimized for memory, and it recursively works across all the submodules contained in a model, moving them to GPU when they are needed and back to CPU when another submodule needs to run. These cpu-to-gpu-to-cpu transfers happen hundreds of times during the stable diffusion denoising loops, because the UNet runs multiple times and it consists of several PyTorch modules.

This release of `diffusers` introduces a coarser `enable_model_cpu_offload()` pipeline API, which copies whole models (not _modules_) to GPU and makes sure they stay there until another model needs to run. The consequences are:
- Less memory savings than `enable_sequential_cpu_offload`, but:
- Almost as fast inference as when the pipeline is used without any type of offloading.


<a name="pix2pix-zero"></a>
Pix2Pix Zero

Remember the CycleGAN days where one would turn a horse into a zebra in an image while keeping the rest of the content almost untouched? Well, that day has arrived but in the context of Diffusion. Pix2Pix Zero allows users to edit a particular image (be it real or generated), targeting a source concept (horse, for example) and replacing it with a target concept (zebra, for example).

Input image | Edited image
:-------------------------:|:-------------------------:
![original](https://user-images.githubusercontent.com/22957388/218466608-04a5a97e-b152-4639-9aab-eb50eb6ba391.png) | ![edited](https://user-images.githubusercontent.com/22957388/218466715-572d2814-2cd3-44b9-8672-993489850094.png)

Pix2Pix Zero was proposed in [Zero-shot Image-to-Image Translation](https://arxiv.org/abs/2302.03027). The `StableDiffusionPix2PixZeroPipeline` allows you to

1. Edit an image generated from an input prompt
2. Provide an input image and edit it

For the latter, it uses the newly introduced `DDIMInverseScheduler` to first obtain the inverted noise from the input image and use that in the subsequent generation process.

Both of the use cases leverage the idea of "edit directions", used for steering the generation toward the target concept gradually from the source concept. To know more, we recommend checking out the [official documentation](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/pix2pix_zero).

<a name="attend-excite"></a>
Attend and excite
[Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models](https://arxiv.org/abs/2301.13826). Attend-and-Excite, guides the generative model to modify the cross-attention values during the image synthesis process to generate images that more faithfully depict the input text prompt. It allows creating images that are more semantically faithful with respect to the input text prompts. Thanks to community contributor evinpinar for leading the charge to add this pipeline!

- Attend and excite 2 by evinpinar yiyixuxu 2369

<a name="semantic-guidance"></a>
Semantic guidance

Semantic Guidance for Diffusion Models was proposed in [SEGA: Instructing Diffusion using Semantic Dimensions](https://arxiv.org/abs/2301.12247) and provides strong semantic control over image generation. Small changes to the text prompt usually result in entirely different output images. However, with SEGA, a variety of changes to the image are enabled that can be controlled easily and intuitively and stay true to the original image composition. Thanks to the lead author of SEFA, Manuel (manuelbrack), who added the pipeline in #2223.

Here is a simple demo:

py
import torch
from diffusers import SemanticStableDiffusionPipeline

pipe = SemanticStableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
pipe = pipe.to("cuda")

out = pipe(
prompt="a photo of the face of a woman",
num_images_per_prompt=1,
guidance_scale=7,
editing_prompt=[
"smiling, smile", Concepts to apply
"glasses, wearing glasses",
"curls, wavy hair, curly hair",
"beard, full beard, mustache",
],
reverse_editing_direction=[False, False, False, False], Direction of guidance i.e. increase all concepts
edit_warmup_steps=[10, 10, 10, 10], Warmup period for each concept
edit_guidance_scale=[4, 5, 5, 5.4], Guidance scale for each concept
edit_threshold=[
0.99,
0.975,
0.925,
0.96,
], Threshold for each concept. Threshold equals the percentile of the latent space that will be discarded. I.e. threshold=0.99 uses 1% of the latent dimensions
edit_momentum_scale=0.3, Momentum scale that will be added to the latent guidance
edit_mom_beta=0.6, Momentum beta
edit_weights=[1, 1, 1, 1, 1], Weights of the individual concepts against each other
)



<a name="self-attention-guidance"></a>
Self-attention guidance

SAG was proposed in [Improving Sample Quality of Diffusion Models Using Self-Attention Guidance](https://arxiv.org/abs/2210.00939). SAG works by extracting the intermediate attention map from a diffusion model at every iteration and selects tokens above a certain attention score for masking and blurring to obtain a partially blurred input. Then, the dissimilarity is measured between the predicted noise outputs obtained from feeding the blurred and original input to the diffusion model and this is further leveraged as guidance. With this guidance, the authors observe apparent improvements in a wide range of diffusion models.

python
import torch
from diffusers import StableDiffusionSAGPipeline
from accelerate.utils import set_seed

pipe = StableDiffusionSAGPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16)
pipe = pipe.to("cuda")

seed = 8978
prompt = "."

Page 1 of 16

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.