Diffusers

Latest version: v0.32.2

Safety actively analyzes 714973 Python packages for vulnerabilities to keep your Python projects secure.

Page 7 of 16

0.23.0

LCM LoRA, LCM SDXL, Consistency Decoder

LCM LoRA

[Latent Consistency Models](https://huggingface.co/docs/diffusers/main/en/api/pipelines/latent_consistency_models) (LCM) made quite the mark in the Stable Diffusion community by enabling ultra-fast inference. LCM author luosiallen, alongside patil-suraj and dg845, managed to extend the LCM support for Stable Diffusion XL (SDXL) and pack everything into a LoRA.

The approach is called LCM LoRA.

Below is an example of using LCM LoRA, taking just **4 inference steps**:

python
from diffusers import DiffusionPipeline, LCMScheduler
import torch

model_id = "stabilityai/stable-diffusion-xl-base-1.0"
lcm_lora_id = "latent-consistency/lcm-lora-sdxl"

pipe = DiffusionPipeline.from_pretrained(model_id, variant="fp16", torch_dtype=torch.float16).to("cuda")

pipe.load_lora_weights(lcm_lora_id)
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

prompt = "close-up photography of old man standing in the rain at night, in a street lit by lamps, leica 35mm summilux"
image = pipe(
prompt=prompt,
num_inference_steps=4,
guidance_scale=1,
).images[0]

You can combine the LoRA with Img2Img, Inpaint, ControlNet, ...

as well as with other LoRAs 🤯

![image (31)](https://github.com/huggingface/diffusers/assets/23423619/a245be57-2c53-4ddf-b5a8-9df68a785c20)

👉 [Checkpoints](https://huggingface.co/collections/latent-consistency/latent-consistency-models-loras-654cdd24e111e16f0865fba6)
📜 [Docs](https://huggingface.co/docs/diffusers/main/en/using-diffusers/lcm)

If you want to learn more about the approach, please have a look at the following:

- [Paper](https://huggingface.co/latent-consistency/lcm-lora-sdxl/blob/main/LCM-LoRA-Technical-Report.pdf)
- [Blog](https://hf.co/blog/lcm_lora)

LCM SDXL

Continuing the work of [Latent Consistency Models](https://huggingface.co/docs/diffusers/main/en/api/pipelines/latent_consistency_models) (LCM), we've applied the approach to SDXL as well and give you SSD-1B and SDXL fine-tuned checkpoints.

python
from diffusers import DiffusionPipeline, UNet2DConditionModel, LCMScheduler
import torch

unet = UNet2DConditionModel.from_pretrained(
"latent-consistency/lcm-sdxl",
torch_dtype=torch.float16,
variant="fp16",
)
pipe = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0", unet=unet, torch_dtype=torch.float16
).to("cuda")
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

prompt = "Self-portrait oil painting, a beautiful cyborg with golden hair, 8k"

generator = torch.manual_seed(0)
image = pipe(
prompt=prompt, num_inference_steps=4, generator=generator, guidance_scale=1.0
).images[0]

👉 [Checkpoints](https://huggingface.co/collections/latent-consistency/latent-consistency-models-weights-654ce61a95edd6dffccef6a8)
📜 [Docs](https://huggingface.co/docs/diffusers/main/en/using-diffusers/lcm)

Consistency Decoder

OpenAI open-sourced the consistency decoder used in [DALL-E 3](https://openai.com/dall-e-3). It improves the decoding part in the Stable Diffusion v1 family of models.

python
import torch
from diffusers import DiffusionPipeline, ConsistencyDecoderVAE

vae = ConsistencyDecoderVAE.from_pretrained("openai/consistency-decoder", torch_dtype=pipe.torch_dtype)
pipe = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5", vae=vae, torch_dtype=torch.float16
).to("cuda")

pipe("horse", generator=torch.manual_seed(0)).images

Find the documentation [here](https://huggingface.co/docs/diffusers/main/en/api/models/consistency_decoder_vae) to learn more.

All commits

* [Custom Pipelines] Make sure that community pipelines can use repo revision by patrickvonplaten in 5659
* post release (v0.22.0) by sayakpaul in 5658
* Add Pixart to AUTO_TEXT2IMAGE_PIPELINES_MAPPING by Beinsezii in 5664
* Update custom diffusion attn processor by DN6 in 5663
* Model tests xformers fixes by DN6 in 5679
* Update free model hooks by DN6 in 5680
* Fix Basic Transformer Block by DN6 in 5683
* Explicit torch/flax dependency check by DN6 in 5673
* [PixArt-Alpha] fix `mask_feature` so that precomputed embeddings work with a batch size > 1 by sayakpaul in 5677
* Make sure DDPM and `diffusers` can be used without Transformers by sayakpaul in 5668
* [PixArt-Alpha] Support non-square images by sayakpaul in 5672
* Improve LCMScheduler by dg845 in 5681
* [`Docs`] Fix typos, improve, update at Using Diffusers' Task page by standardAI in 5611
* Replacing the nn.Mish activation function with a get_activation function. by hi-sushanta in 5651
* speed up Shap-E fast test by yiyixuxu in 5686
* Fix the misaligned pipeline usage in dreamshaper docstrings by kirill-fedyanin in 5700
* Fixed is_safetensors_compatible() handling of windows path separators by PhilLab in 5650
* [LCM] Fix img2img by patrickvonplaten in 5698
* [PixArt-Alpha] fix mask feature condition. by sayakpaul in 5695
* Fix styling issues by patrickvonplaten in 5699
* Add adapter fusing + PEFT to the docs by apolinario in 5662
* Fix prompt bug in AnimateDiff by DN6 in 5702
* [Bugfix] fix error of peft lora when xformers enabled by okotaku in 5697
* Install accelerate from PyPI in PR test runner by DN6 in 5721
* consistency decoder by williamberman in 5694
* Correct consist dec by patrickvonplaten in 5722
* LCM Add Tests by patrickvonplaten in 5707
* [LCM] add: locm docs. by sayakpaul in 5723
* Add LCM Scripts by patil-suraj in 5727

0.22.3

🐛 There were some sneaky bugs in the PixArt-Alpha and LCM Image-to-Image pipelines which have been fixed in this release.

All commits

* [LCM] Fix img2img by patrickvonplaten in 5698
* [PixArt-Alpha] fix mask feature condition. by sayakpaul in 5695

0.22.2

* Fix Basic Transformer Block by DN6 in 5683
* [PixArt-Alpha] fix `mask_feature` so that precomputed embeddings work with a batch size > 1 by sayakpaul in 5677
* Make sure DDPM and `diffusers` can be used without Transformers by sayakpaul in 5668
* [PixArt-Alpha] Support non-square images by sayakpaul in 5672

0.22.1

- [Custom Pipelines] Make sure that community pipelines can use repo revision by patrickvonplaten

0.22.0

Latent Consistency Models (LCM)

![Untitled](https://github.com/huggingface/diffusers/assets/22957388/cce05d6b-e5de-4be0-8416-36556b80176e)

LCMs enable a significantly fast inference process for diffusion models. They require far fewer inference steps to produce high-resolution images without compromising the image quality too much. Below is a usage example:

python
import torch
from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained("SimianLuo/LCM_Dreamshaper_v7", torch_dtype=torch.float32)

To save GPU memory, torch.float16 can be used, but it may compromise image quality.
pipe.to(torch_device="cuda", torch_dtype=torch.float32)

prompt = "Self-portrait oil painting, a beautiful cyborg with golden hair, 8k"

Can be set to 1~50 steps. LCM support fast inference even <= 4 steps. Recommend: 1~8 steps.
num_inference_steps = 4

images = pipe(prompt=prompt, num_inference_steps=num_inference_steps, guidance_scale=8.0).images

Refer to the [documentation](https://huggingface.co/docs/diffusers/main/en/api/pipelines/latent_consistency_models) to learn more.

LCM comes with both text-to-image and image-to-image pipelines and they were contributed by luosiallen, nagolinc, and dg845.

PixArt-Alpha

<img width="627" alt="header_collage" src="https://github.com/huggingface/diffusers/assets/22957388/25beb50f-d530-4d19-9430-b1c00ae3b1d9">

PixArt-Alpha is a Transformer-based text-to-image diffusion model that rivals the quality of the existing state-of-the-art ones, such as Stable Diffusion XL, Imagen, and DALL-E 2, while being more efficient.

It was trained T5 text embeddings and has a maximum sequence length of 120. Thus, it allows for more detailed prompt inputs, unlocking better quality generations.

Despite the large text encoder, with model offloading, it takes a little under 11GBs of VRAM to run the `PixArtAlphaPipeline`:

python
from diffusers import PixArtAlphaPipeline
import torch

pipeline_id = "PixArt-alpha/PixArt-XL-2-1024-MS"
pipeline = PixArtAlphaPipeline.from_pretrained(pipeline_id, torch_dtype=torch.float16)
pipeline.enable_model_cpu_offload()

prompt = "A small cactus with a happy face in the Sahara desert."
image = pipe(prompt).images[0]
image.save("sahara.png")

Check out the [docs](https://huggingface.co/docs/diffusers/main/en/api/pipelines/pixart) to learn more.

AnimateDiff

![animatediff-doc](https://github.com/huggingface/diffusers/assets/22957388/1a5f416b-f272-444c-b8bc-1b470abf38d4)

AnimateDiff is a modelling framework that allows you to create videos using pre-existing Stable Diffusion text-to-image models. It achieves this by inserting motion module layers into a frozen text-to-image model and training it on video clips to extract a motion prior.

These motion modules are applied after the ResNet and Attention blocks in the Stable Diffusion UNet. Their purpose is to introduce coherent motion across image frames. To support these modules, we introduce the concepts of a `MotionAdapter` and a `UNetMotionModel`. These serve as a convenient way to use these motion modules with existing Stable Diffusion models.

The following example demonstrates how you can utilize the motion modules with an existing Stable Diffusion text-to-image model.

python
import torch
from diffusers import MotionAdapter, AnimateDiffPipeline, DDIMScheduler
from diffusers.utils import export_to_gif

Load the motion adapter
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2")

load SD 1.5 based finetuned model
model_id = "SG161222/Realistic_Vision_V5.1_noVAE"
pipe = AnimateDiffPipeline.from_pretrained(model_id, motion_adapter=adapter)
scheduler = DDIMScheduler.from_pretrained(
model_id, subfolder="scheduler", clip_sample=False, timestep_spacing="linspace", steps_offset=1
)
pipe.scheduler = scheduler

enable memory savings
pipe.enable_vae_slicing()
pipe.enable_model_cpu_offload()

output = pipe(
prompt=(
"masterpiece, bestquality, highlydetailed, ultradetailed, sunset, "
"orange sky, warm lighting, fishing boats, ocean waves seagulls, "
"rippling water, wharf, silhouette, serene atmosphere, dusk, evening glow, "
"golden hour, coastal landscape, seaside scenery"
),
negative_prompt="bad quality, worse quality",
num_frames=16,
guidance_scale=7.5,
num_inference_steps=25,
generator=torch.Generator("cpu").manual_seed(42),
)
frames = output.frames[0]
export_to_gif(frames, "animation.gif")

You can convert an existing 2D UNet into a `UNetMotionModel`:

python
from diffusers import MotionAdapter, UNetMotionModel, UNet2DConditionModel

unet = UNetMotionModel()

Load from an existing 2D UNet and MotionAdapter
unet2D = UNet2DConditionModel.from_pretrained("SG161222/Realistic_Vision_V5.1_noVAE", subfolder="unet")
motion_adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2")

load motion adapter here
unet_motion = UNetMotionModel.from_unet2d(unet2D, motion_adapter: Optional = None)

Or load motion modules after init
unet_motion.load_motion_modules(motion_adapter)

freeze all 2D UNet layers except for the motion modules for finetuning
unet_motion.freeze_unet2d_params()

Save only motion modules
unet_motion.save_motion_module(<path to save model>, push_to_hub=True)

AnimateDiff also comes with motion LoRA modules, letting you control subtleties:

python
import torch
from diffusers import MotionAdapter, AnimateDiffPipeline, DDIMScheduler
from diffusers.utils import export_to_gif

Load the motion adapter
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2")
load SD 1.5 based finetuned model
model_id = "SG161222/Realistic_Vision_V5.1_noVAE"
pipe = AnimateDiffPipeline.from_pretrained(model_id, motion_adapter=adapter)
pipe.load_lora_weights("guoyww/animatediff-motion-lora-zoom-out", adapter_name="zoom-out")

scheduler = DDIMScheduler.from_pretrained(
model_id, subfolder="scheduler", clip_sample=False, timestep_spacing="linspace", steps_offset=1
)
pipe.scheduler = scheduler

enable memory savings
pipe.enable_vae_slicing()
pipe.enable_model_cpu_offload()

output = pipe(
prompt=(
"masterpiece, bestquality, highlydetailed, ultradetailed, sunset, "
"orange sky, warm lighting, fishing boats, ocean waves seagulls, "
"rippling water, wharf, silhouette, serene atmosphere, dusk, evening glow, "
"golden hour, coastal landscape, seaside scenery"
),
negative_prompt="bad quality, worse quality",
num_frames=16,
guidance_scale=7.5,
num_inference_steps=25,
generator=torch.Generator("cpu").manual_seed(42),
)
frames = output.frames[0]
export_to_gif(frames, "animation.gif")

![animatediff-zoom-out-lora](https://github.com/huggingface/diffusers/assets/22957388/edfa27f3-a8fd-4ce7-809a-f8e3b39f4868)

Check out the [documentation](https://huggingface.co/docs/diffusers/main/en/api/pipelines/animatediff) to learn more.

PEFT 🤝 Diffusers

There are many adapters (LoRA, for example) trained in different styles to achieve different effects. You can even combine multiple adapters to create new and unique images. With the 🤗 **[PEFT](https://huggingface.co/docs/peft/index)** integration in 🤗 Diffusers, it is really easy to load and manage adapters for inference.

Here is an example of combining multiple LoRAs using this new integration:

python
from diffusers import DiffusionPipeline
import torch

pipe_id = "stabilityai/stable-diffusion-xl-base-1.0"
pipe = DiffusionPipeline.from_pretrained(pipe_id, torch_dtype=torch.float16).to("cuda")

Load LoRA 1.
pipe.load_lora_weights("CiroN2022/toy-face", weight_name="toy_face_sdxl.safetensors", adapter_name="toy")
Load LoRA 2.
pipe.load_lora_weights("nerijs/pixel-art-xl", weight_name="pixel-art-xl.safetensors", adapter_name="pixel")

Combine the adapters.

0.21.4

- [Lora] fix lora fuse unfuse in 5003 by patrickvonplaten

Page 7 of 16

Releases

Has known vulnerabilities

Previous Next

Diffusers

Page 7 of 16

0.23.0

0.22.3

0.22.2

0.22.1

0.22.0

0.21.4

Page 7 of 16

Links

Releases