Diffusers

Latest version: v0.29.1

Safety actively analyzes 640974 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 5 of 14

0.24.0

Stable Video Diffusion, SDXL Turbo, IP Adapters, Kandinsky 3.0

Stable Diffusion Video

[Stable Video Diffusion](https://huggingface.co/papers/2311.15127) is a powerful image-to-video generation model that can generate high resolution (576x1024) 2-4 seconds videos conditioned on the input image.

Image to Video Generation

There are two variants of SVD. [SVD](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid) and [SVD-XT](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt). The SVD checkpoint is trained to generate 14 frames and the SVD-XT checkpoint is further finetuned to generate 25 frames.

You need to condition the generation on an initial image, as follows:

python
import torch

from diffusers import StableVideoDiffusionPipeline
from diffusers.utils import load_image, export_to_video

pipe = StableVideoDiffusionPipeline.from_pretrained(
"stabilityai/stable-video-diffusion-img2vid-xt", torch_dtype=torch.float16, variant="fp16"
)
pipe.enable_model_cpu_offload()

Load the conditioning image
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/svd/rocket.png?download=true")
image = image.resize((1024, 576))

generator = torch.manual_seed(42)
frames = pipe(image, decode_chunk_size=8, generator=generator).frames[0]

export_to_video(frames, "generated.mp4", fps=7)


Since generating videos is more memory intensive, we can use the `decode_chunk_size` argument to control how many frames are decoded at once. This will reduce the memory usage. It's recommended to tweak this value based on your GPU memory. Setting `decode_chunk_size=1` will decode one frame at a time and will use the least amount of memory, but the video might have some flickering.

Additionally, we also use [model cpu offloading](https://huggingface.co/docs/diffusers/main/en/optimization/memory#model-offloading) to reduce the memory usage.

![rocket_generated](https://github.com/huggingface/diffusers/assets/23423619/b861e9cc-6550-4796-823f-a32ddf20b7c4)

SDXL Turbo

SDXL Turbo is an adversarial time-distilled [Stable Diffusion XL](https://huggingface.co/papers/2307.01952) (SDXL) model capable of running inference in as little as 1 step. Also, it does not use classifier-free guidance, further increasing its speed. On a good consumer GPU, you can now generate an image in just 100ms.

Text-to-Image

For text-to-image, pass a text prompt. By default, SDXL Turbo generates a 512x512 image, and that resolution gives the best results. You can try setting the `height` and `width` parameters to 768x768 or 1024x1024, but you should expect quality degradations when doing so.

Make sure to set `guidance_scale` to 0.0 to disable, as the model was trained without it. A single inference step is enough to generate high quality images.
Increasing the number of steps to 2, 3 or 4 should improve image quality.

py
from diffusers import AutoPipelineForText2Image
import torch

pipeline_text2image = AutoPipelineForText2Image.from_pretrained("stabilityai/sdxl-turbo", torch_dtype=torch.float16, variant="fp16")
pipeline_text2image = pipeline_text2image.to("cuda")

prompt = "A cinematic shot of a baby racoon wearing an intricate italian priest robe."

image = pipeline_text2image(prompt=prompt, guidance_scale=0.0, num_inference_steps=1).images[0]
image


<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/sdxl-turbo-text2img.png" alt="generated image of a racoon in a robe"/>
</div>

Image-to-image

For image-to-image generation, make sure that `num_inference_steps * strength` is larger or equal to 1.
The image-to-image pipeline will run for `int(num_inference_steps * strength)` steps, e.g. `0.5 * 2.0 = 1` step in
our example below.

py
from diffusers import AutoPipelineForImage2Image
from diffusers.utils import load_image, make_image_grid

use from_pipe to avoid consuming additional memory when loading a checkpoint
pipeline = AutoPipelineForImage2Image.from_pipe(pipeline_text2image).to("cuda")

init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png")
init_image = init_image.resize((512, 512))

prompt = "cat wizard, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney, 8k"

image = pipeline(prompt, image=init_image, strength=0.5, guidance_scale=0.0, num_inference_steps=2).images[0]
make_image_grid([init_image, image], rows=1, cols=2)


<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/sdxl-turbo-img2img.png" alt="Image-to-image generation sample using SDXL Turbo"/>
</div>

IP Adapters

[IP Adapters](https://github.com/tencent-ailab/IP-Adapter) have shown to be remarkably powerful at images conditioned on other images.

Thanks to okotaku, we have added IP adapters to the most important pipelines allowing you to combine them for a variety of different workflows, *e.g.* they work with Img2Img2, ControlNet, and LCM-LoRA out of the box.

LCM-LoRA

python
from diffusers import DiffusionPipeline, LCMScheduler
import torch
from diffusers.utils import load_image

model_id = "sd-dreambooth-library/herge-style"
lcm_lora_id = "latent-consistency/lcm-lora-sdv1-5"

pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)

pipe.load_ip_adapter("h94/IP-Adapter", subfolder="models", weight_name="ip-adapter_sd15.bin")
pipe.load_lora_weights(lcm_lora_id)
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()

prompt = "best quality, high quality"
image = load_image("https://user-images.githubusercontent.com/24734142/266492875-2d50d223-8475-44f0-a7c6-08b51cb53572.png")
images = pipe(
prompt=prompt,
ip_adapter_image=image,
num_inference_steps=4,
guidance_scale=1,
).images[0]

![yiyi_test_2_out](https://github.com/huggingface/diffusers/assets/12631849/9dc239c3-4483-46b9-a62b-b0c49c0c3b42)

ControlNet

py
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
import torch
from diffusers.utils import load_image

controlnet_model_path = "lllyasviel/control_v11f1p_sd15_depth"
controlnet = ControlNetModel.from_pretrained(controlnet_model_path, torch_dtype=torch.float16)

pipeline = StableDiffusionControlNetPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16)
pipeline.to("cuda")

image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/statue.png")
depth_map = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/depth.png")

pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="models", weight_name="ip-adapter_sd15.bin")

generator = torch.Generator(device="cpu").manual_seed(33)
images = pipeline(
prompt='best quality, high quality',
image=depth_map,
ip_adapter_image=image,
negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality",
num_inference_steps=50,
generator=generator,
).images
images[0].save("yiyi_test_2_out.png")


ip_image | condition | output |
:-------------------------:|:-------------------------:|:-------------------------:
![statue](https://github.com/huggingface/diffusers/assets/12631849/c680ea27-de29-44b8-b9bf-d4d51c56c6a8) | ![depth](https://github.com/huggingface/diffusers/assets/12631849/0a476a2a-ec20-4ea3-9cc8-b390e8eb5685) | ![yiyi_test_2_out](https://github.com/huggingface/diffusers/assets/12631849/10b0af13-09ca-4de0-a72a-e43ba51dd706)

For more information:
- :point_right: https://huggingface.co/docs/diffusers/main/en/using-diffusers/loading_adapters#ip-adapter

0.23.1

Small patch release to make sure the correct PEFT version is installed.

All commits
- Improve setup.py and add dependency check by patrickvonplaten in 5826

0.23.0

LCM LoRA, LCM SDXL, Consistency Decoder

LCM LoRA

[Latent Consistency Models](https://huggingface.co/docs/diffusers/main/en/api/pipelines/latent_consistency_models) (LCM) made quite the mark in the Stable Diffusion community by enabling ultra-fast inference. LCM author luosiallen, alongside patil-suraj and dg845, managed to extend the LCM support for Stable Diffusion XL (SDXL) and pack everything into a LoRA.

The approach is called LCM LoRA.

Below is an example of using LCM LoRA, taking just **4 inference steps**:

python
from diffusers import DiffusionPipeline, LCMScheduler
import torch

model_id = "stabilityai/stable-diffusion-xl-base-1.0"
lcm_lora_id = "latent-consistency/lcm-lora-sdxl"

pipe = DiffusionPipeline.from_pretrained(model_id, variant="fp16", torch_dtype=torch.float16).to("cuda")

pipe.load_lora_weights(lcm_lora_id)
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

prompt = "close-up photography of old man standing in the rain at night, in a street lit by lamps, leica 35mm summilux"
image = pipe(
prompt=prompt,
num_inference_steps=4,
guidance_scale=1,
).images[0]

You can combine the LoRA with Img2Img, Inpaint, ControlNet, ...

as well as with other LoRAs 🤯

![image (31)](https://github.com/huggingface/diffusers/assets/23423619/a245be57-2c53-4ddf-b5a8-9df68a785c20)

👉 [Checkpoints](https://huggingface.co/collections/latent-consistency/latent-consistency-models-loras-654cdd24e111e16f0865fba6)
📜 [Docs](https://huggingface.co/docs/diffusers/main/en/using-diffusers/lcm)

If you want to learn more about the approach, please have a look at the following:

- [Paper](https://huggingface.co/latent-consistency/lcm-lora-sdxl/blob/main/LCM-LoRA-Technical-Report.pdf)
- [Blog](https://hf.co/blog/lcm_lora)

LCM SDXL

Continuing the work of [Latent Consistency Models](https://huggingface.co/docs/diffusers/main/en/api/pipelines/latent_consistency_models) (LCM), we've applied the approach to SDXL as well and give you SSD-1B and SDXL fine-tuned checkpoints.

python
from diffusers import DiffusionPipeline, UNet2DConditionModel, LCMScheduler
import torch

unet = UNet2DConditionModel.from_pretrained(
"latent-consistency/lcm-sdxl",
torch_dtype=torch.float16,
variant="fp16",
)
pipe = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0", unet=unet, torch_dtype=torch.float16
).to("cuda")
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

prompt = "Self-portrait oil painting, a beautiful cyborg with golden hair, 8k"

generator = torch.manual_seed(0)
image = pipe(
prompt=prompt, num_inference_steps=4, generator=generator, guidance_scale=1.0
).images[0]


👉 [Checkpoints](https://huggingface.co/collections/latent-consistency/latent-consistency-models-weights-654ce61a95edd6dffccef6a8)
📜 [Docs](https://huggingface.co/docs/diffusers/main/en/using-diffusers/lcm)

Consistency Decoder

OpenAI open-sourced the consistency decoder used in [DALL-E 3](https://openai.com/dall-e-3). It improves the decoding part in the Stable Diffusion v1 family of models.

python
import torch
from diffusers import DiffusionPipeline, ConsistencyDecoderVAE

vae = ConsistencyDecoderVAE.from_pretrained("openai/consistency-decoder", torch_dtype=pipe.torch_dtype)
pipe = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5", vae=vae, torch_dtype=torch.float16
).to("cuda")

pipe("horse", generator=torch.manual_seed(0)).images


Find the documentation [here](https://huggingface.co/docs/diffusers/main/en/api/models/consistency_decoder_vae) to learn more.

All commits

* [Custom Pipelines] Make sure that community pipelines can use repo revision by patrickvonplaten in 5659
* post release (v0.22.0) by sayakpaul in 5658
* Add Pixart to AUTO_TEXT2IMAGE_PIPELINES_MAPPING by Beinsezii in 5664
* Update custom diffusion attn processor by DN6 in 5663
* Model tests xformers fixes by DN6 in 5679
* Update free model hooks by DN6 in 5680
* Fix Basic Transformer Block by DN6 in 5683
* Explicit torch/flax dependency check by DN6 in 5673
* [PixArt-Alpha] fix `mask_feature` so that precomputed embeddings work with a batch size > 1 by sayakpaul in 5677
* Make sure DDPM and `diffusers` can be used without Transformers by sayakpaul in 5668
* [PixArt-Alpha] Support non-square images by sayakpaul in 5672
* Improve LCMScheduler by dg845 in 5681
* [`Docs`] Fix typos, improve, update at Using Diffusers' Task page by standardAI in 5611
* Replacing the nn.Mish activation function with a get_activation function. by hi-sushanta in 5651
* speed up Shap-E fast test by yiyixuxu in 5686
* Fix the misaligned pipeline usage in dreamshaper docstrings by kirill-fedyanin in 5700
* Fixed is_safetensors_compatible() handling of windows path separators by PhilLab in 5650
* [LCM] Fix img2img by patrickvonplaten in 5698
* [PixArt-Alpha] fix mask feature condition. by sayakpaul in 5695
* Fix styling issues by patrickvonplaten in 5699
* Add adapter fusing + PEFT to the docs by apolinario in 5662
* Fix prompt bug in AnimateDiff by DN6 in 5702
* [Bugfix] fix error of peft lora when xformers enabled by okotaku in 5697
* Install accelerate from PyPI in PR test runner by DN6 in 5721
* consistency decoder by williamberman in 5694
* Correct consist dec by patrickvonplaten in 5722
* LCM Add Tests by patrickvonplaten in 5707
* [LCM] add: locm docs. by sayakpaul in 5723
* Add LCM Scripts by patil-suraj in 5727

0.22.3

🐛 There were some sneaky bugs in the PixArt-Alpha and LCM Image-to-Image pipelines which have been fixed in this release.

All commits

* [LCM] Fix img2img by patrickvonplaten in 5698
* [PixArt-Alpha] fix mask feature condition. by sayakpaul in 5695

0.22.2

* Fix Basic Transformer Block by DN6 in 5683
* [PixArt-Alpha] fix `mask_feature` so that precomputed embeddings work with a batch size > 1 by sayakpaul in 5677
* Make sure DDPM and `diffusers` can be used without Transformers by sayakpaul in 5668
* [PixArt-Alpha] Support non-square images by sayakpaul in 5672

0.22.1

- [Custom Pipelines] Make sure that community pipelines can use repo revision by patrickvonplaten

Page 5 of 14

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.