Latent Consistency Models (LCM)


LCMs enable a significantly fast inference process for diffusion models. They require far fewer inference steps to produce high-resolution images without compromising the image quality too much. Below is a usage example:

import torch
from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained("SimianLuo/LCM_Dreamshaper_v7", torch_dtype=torch.float32)

To save GPU memory, torch.float16 can be used, but it may compromise image quality.
pipe.to(torch_device="cuda", torch_dtype=torch.float32)

prompt = "Self-portrait oil painting, a beautiful cyborg with golden hair, 8k"

Can be set to 1~50 steps. LCM support fast inference even <= 4 steps. Recommend: 1~8 steps.
num_inference_steps = 4

images = pipe(prompt=prompt, num_inference_steps=num_inference_steps, guidance_scale=8.0).images

Refer to the [documentation](https://huggingface.co/docs/diffusers/main/en/api/pipelines/latent_consistency_models) to learn more.

LCM comes with both text-to-image and image-to-image pipelines and they were contributed by luosiallen, nagolinc, and dg845.


<img width="627" alt="header_collage" src="https://github.com/huggingface/diffusers/assets/22957388/25beb50f-d530-4d19-9430-b1c00ae3b1d9">

PixArt-Alpha is a Transformer-based text-to-image diffusion model that rivals the quality of the existing state-of-the-art ones, such as Stable Diffusion XL, Imagen, and DALL-E 2, while being more efficient.

It was trained T5 text embeddings and has a maximum sequence length of 120. Thus, it allows for more detailed prompt inputs, unlocking better quality generations.

Despite the large text encoder, with model offloading, it takes a little under 11GBs of VRAM to run the `PixArtAlphaPipeline`:

from diffusers import PixArtAlphaPipeline
import torch

pipeline_id = "PixArt-alpha/PixArt-XL-2-1024-MS"
pipeline = PixArtAlphaPipeline.from_pretrained(pipeline_id, torch_dtype=torch.float16)

prompt = "A small cactus with a happy face in the Sahara desert."
image = pipe(prompt).images[0]

Check out the [docs](https://huggingface.co/docs/diffusers/main/en/api/pipelines/pixart) to learn more.



AnimateDiff is a modelling framework that allows you to create videos using pre-existing Stable Diffusion text-to-image models. It achieves this by inserting motion module layers into a frozen text-to-image model and training it on video clips to extract a motion prior.

These motion modules are applied after the ResNet and Attention blocks in the Stable Diffusion UNet. Their purpose is to introduce coherent motion across image frames. To support these modules, we introduce the concepts of a `MotionAdapter` and a `UNetMotionModel`. These serve as a convenient way to use these motion modules with existing Stable Diffusion models.

The following example demonstrates how you can utilize the motion modules with an existing Stable Diffusion text-to-image model.

import torch
from diffusers import MotionAdapter, AnimateDiffPipeline, DDIMScheduler
from diffusers.utils import export_to_gif

Load the motion adapter
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2")

load SD 1.5 based finetuned model
model_id = "SG161222/Realistic_Vision_V5.1_noVAE"
pipe = AnimateDiffPipeline.from_pretrained(model_id, motion_adapter=adapter)
scheduler = DDIMScheduler.from_pretrained(
model_id, subfolder="scheduler", clip_sample=False, timestep_spacing="linspace", steps_offset=1
pipe.scheduler = scheduler

enable memory savings

output = pipe(
"masterpiece, bestquality, highlydetailed, ultradetailed, sunset, "
"orange sky, warm lighting, fishing boats, ocean waves seagulls, "
"rippling water, wharf, silhouette, serene atmosphere, dusk, evening glow, "
"golden hour, coastal landscape, seaside scenery"
negative_prompt="bad quality, worse quality",
frames = output.frames[0]
export_to_gif(frames, "animation.gif")

You can convert an existing 2D UNet into a `UNetMotionModel`:

from diffusers import MotionAdapter, UNetMotionModel, UNet2DConditionModel

unet = UNetMotionModel()

Load from an existing 2D UNet and MotionAdapter
unet2D = UNet2DConditionModel.from_pretrained("SG161222/Realistic_Vision_V5.1_noVAE", subfolder="unet")
motion_adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2")

load motion adapter here
unet_motion = UNetMotionModel.from_unet2d(unet2D, motion_adapter: Optional = None)

Or load motion modules after init

freeze all 2D UNet layers except for the motion modules for finetuning

Save only motion modules
unet_motion.save_motion_module(<path to save model>, push_to_hub=True)

AnimateDiff also comes with motion LoRA modules, letting you control subtleties:

import torch
from diffusers import MotionAdapter, AnimateDiffPipeline, DDIMScheduler
from diffusers.utils import export_to_gif

Load the motion adapter
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2")
load SD 1.5 based finetuned model
model_id = "SG161222/Realistic_Vision_V5.1_noVAE"
pipe = AnimateDiffPipeline.from_pretrained(model_id, motion_adapter=adapter)
pipe.load_lora_weights("guoyww/animatediff-motion-lora-zoom-out", adapter_name="zoom-out")

scheduler = DDIMScheduler.from_pretrained(
model_id, subfolder="scheduler", clip_sample=False, timestep_spacing="linspace", steps_offset=1
pipe.scheduler = scheduler

enable memory savings

output = pipe(
"masterpiece, bestquality, highlydetailed, ultradetailed, sunset, "
"orange sky, warm lighting, fishing boats, ocean waves seagulls, "
"rippling water, wharf, silhouette, serene atmosphere, dusk, evening glow, "
"golden hour, coastal landscape, seaside scenery"
negative_prompt="bad quality, worse quality",
frames = output.frames[0]
export_to_gif(frames, "animation.gif")


Check out the [documentation](https://huggingface.co/docs/diffusers/main/en/api/pipelines/animatediff) to learn more.

PEFT 🤝 Diffusers

There are many adapters (LoRA, for example) trained in different styles to achieve different effects. You can even combine multiple adapters to create new and unique images. With the 🤗 **[PEFT](https://huggingface.co/docs/peft/index)** integration in 🤗 Diffusers, it is really easy to load and manage adapters for inference.

Here is an example of combining multiple LoRAs using this new integration:

from diffusers import DiffusionPipeline
import torch

pipe_id = "stabilityai/stable-diffusion-xl-base-1.0"
pipe = DiffusionPipeline.from_pretrained(pipe_id, torch_dtype=torch.float16).to("cuda")

Load LoRA 1.
pipe.load_lora_weights("CiroN2022/toy-face", weight_name="toy_face_sdxl.safetensors", adapter_name="toy")
Load LoRA 2.
pipe.load_lora_weights("nerijs/pixel-art-xl", weight_name="pixel-art-xl.safetensors", adapter_name="pixel")

Combine the adapters.


- [Lora] fix lora fuse unfuse in 5003 by patrickvonplaten


* [LoRA, Xformers] Fix xformers lora by patrickvonplaten in https://github.com/huggingface/diffusers/pull/5201


* [Textual inversion] Refactor textual inversion to make it cleaner by patrickvonplaten in 5076
* t2i Adapter community member fix by williamberman in 5090
* remove unused adapter weights in constructor by williamberman in 5088
* [LoRA] don't break offloading for incompatible lora ckpts. by sayakpaul in 5085


* Fix model offload bug when key isn't present by DN6 in 5030
* [Import] Don't force transformers to be installed by patrickvonplaten in 5035
* allow loading of sd models from safetensors without online lookups using local config files by vladmandic in 5019
* [Import] Add missing settings / Correct some dummy imports by patrickvonplaten in 5036




[Würstchen](https://huggingface.co/papers/2306.00637) is a diffusion model, whose text-conditional model works in a highly compressed latent space of images, allowing cheaper and faster inference.


Here is how to use the Würstchen as a pipeline:

import torch
from diffusers import AutoPipelineForText2Image
from diffusers.pipelines.wuerstchen import DEFAULT_STAGE_C_TIMESTEPS

pipeline = AutoPipelineForText2Image.from_pretrained("warp-ai/wuerstchen", torch_dtype=torch.float16).to("cuda")

caption = "Anthropomorphic cat dressed as a firefighter"
images = pipeline(


To learn more about the pipeline, check out the [official documentation](https://huggingface.co/docs/diffusers/main/en/api/pipelines/wuerstchen).

This pipeline was contributed by one of the authors of Würstchen, dome272, with help from kashif and patrickvonplaten.

👉 Try out the model here: https://huggingface.co/spaces/warp-ai/Wuerstchen

T2I Adapters for Stable Diffusion XL (SDXL)

[T2I-Adapter](https://huggingface.co/papers/2302.08453) is an efficient plug-and-play model that provides extra guidance to pre-trained text-to-image models while freezing the original large text-to-image models.

In collaboration with the Tencent ARC researchers, we trained T2I Adapters on various conditions: sketch, canny, lineart, depth, and openpose.

Below is an how to use the `StableDiffusionXLAdapterPipeline`.

First ensure, the `controlnet_aux` is installed:

pip install -U controlnet_aux==0.0.7

Then we can initialize the pipeline:

import torch
from controlnet_aux.lineart import LineartDetector
from diffusers import (AutoencoderKL, EulerAncestralDiscreteScheduler,
StableDiffusionXLAdapterPipeline, T2IAdapter)
from diffusers.utils import load_image, make_image_grid

load adapter
adapter = T2IAdapter.from_pretrained(
"TencentARC/t2i-adapter-lineart-sdxl-1.0", torch_dtype=torch.float16, varient="fp16"

load pipeline
model_id = "stabilityai/stable-diffusion-xl-base-1.0"
euler_a = EulerAncestralDiscreteScheduler.from_pretrained(
model_id, subfolder="scheduler"
vae = AutoencoderKL.from_pretrained(
"madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16
pipe = StableDiffusionXLAdapterPipeline.from_pretrained(

load lineart detector
line_detector = LineartDetector.from_pretrained("lllyasviel/Annotators").to("cuda")

We then load an image to compute the lineart conditionings:

url = "https://huggingface.co/Adapter/t2iadapter/resolve/main/figs_SDXLV1.0/org_lin.jpg"
image = load_image(url)
image = line_detector(image, detect_resolution=384, image_resolution=1024)

Then we generate:

prompt = "Ice dragon roar, 4k photo"
negative_prompt = "anime, cartoon, graphic, text, painting, crayon, graphite, abstract, glitch, deformed, mutated, ugly, disfigured"
gen_images = pipe(


Refer to the [official documentation](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/adapter) to learn more about `StableDiffusionXLAdapterPipeline`.

[This blog post](https://huggingface.co/blog/t2i-sdxl-adapters) summarizes our experiences and provides all the resources (including the pre-trained T2I Adapter checkpoints) to get started using T2I Adapters for SDXL.

We’re also releasing a training script for training your custom T2I Adapters on SDXL. Check out the [documentation](https://huggingface.co/docs/diffusers/main/en/training/t2i_adapters) to learn more.

Thanks to MC-E (one of the authors of T2I Adapters) for contributing the `StableDiffusionXLAdapterPipeline` in 4696.

Faster imports

We introduced “lazy imports” (4829) to significantly improve the time it takes to import our modules (such as `pipelines`, `models`, and so on). Below is a comparison of the timings with and without lazy imports on `import diffusers`.

**With lazy imports**:

real 0m0.417s
user 0m0.714s
sys 0m0.499s

**Without lazy imports**:

real 0m5.391s
user 0m5.299s
sys 0m1.273s

Faster LoRA loading

Previously, loading LoRA parameters using the `load_lora_weights()` used to be time-consuming as reported in 4975. To this end, we introduced a `low_cpu_mem_usage` argument to the `load_lora_weights()` method in 4994 which should speed up the loading time significantly. Just pass `low_cpu_mem_usage=True` to take the benefits.

LoRA fusing

LoRA weights can now be fused into the model weights, thus allowing models that have loaded LoRA weights to run as fast as models without. It also enables to fuse multiple LoRAs into the same model.

For more information, have a look at [the documentation](https://huggingface.co/docs/diffusers/main/en/training/lora#fusing-lora-parameters) and the original PR: https://github.com/huggingface/diffusers/pull/4473.

More support for LoRAs

Almost all LoRA formats out there for SDXL are now supported. For a more details, please check [the documentation](https://huggingface.co/docs/diffusers/main/en/training/lora#supporting-different-lora-checkpoints-from-diffusers).

