Diffusers

Latest version: v0.32.2

Safety actively analyzes 723132 Python packages for vulnerabilities to keep your Python projects secure.

Page 6 of 16

0.26.1

In the v0.26.0 release, we slipped in the `torchvision` library as a required library, which shouldn't have been the case. This is now fixed.

All commits

* add is_torchvision_available by yiyixuxu in 6800

0.26.0

* fix torchvision import by patrickvonplaten in 6796

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* a-r-r-o-w
* [Community] Experimental AnimateDiff Image to Video (open to improvements) (6509)
* AnimateDiff Video to Video (6328)
* [docs] AnimateDiff Video-to-Video (6712)
* fix community README (6645)
* ultranity
* refactor: extract init/forward function in UNet2DConditionModel (6478)
* lawrence-cj
* add Sa-Solver (5975)
* ayushtues
* [WIP][Community Pipeline] InstaFlow! One-Step Stable Diffusion with Rectified Flow (6057)
* haofanwang
* Add InstantID Pipeline (6673)
* [Fix bugs] pipeline_controlnet_sd_xl.py (6653)
* brandostrong
* SD 1.5 Support For Advanced Lora Training (train_dreambooth_lora_sdxl_advanced.py) (6449)
* dg845
* Add Community Example Consistency Training Script (6717)
* Add UFOGenScheduler to Community Examples (6650)
* Fix bug in ResnetBlock2D.forward where LoRA Scale gets Overwritten (6736)

0.25.1

Make sure `diffusers` can correctly be used in offline mode again: https://github.com/huggingface/diffusers/pull/1767#issuecomment-1896194917

* Respect offline mode when loading pipeline by Wauplin in 6456
* Fix offline mode import by Wauplin in 6467

0.25.0

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* hako-mikan
* [Community Pipeline] Regional Prompting Pipeline (6015)
* [Fix] Fix Regional Prompting Pipeline (6188)
* TonyLianLong
* LLMGroundedDiffusionPipeline: inherit from DiffusionPipeline and fix peft (6023)
* okotaku
* [Feature] Support IP-Adapter Plus (5915)
* RuoyiDu
* [Community Pipeline] DemoFusion: Democratising High-Resolution Image Generation With No $$$ (6022)
* UmerHA
* Add ControlNet-XS support (5827)
* a-r-r-o-w
* [Community] AnimateDiff + Controlnet Pipeline (5928)
* IP adapter support for most pipelines (5900)
* Add missing subclass docs, Fix broken example in SD_safe (6116)
* Support img2img and inpaint in lpw-xl (6114)
* Monohydroxides
* [Community] Add SDE Drag pipeline (6105)
* dg845
* Clean Up Comments in LCM(-LoRA) Distillation Scripts. (6145)
* Change LCM-LoRA README Script Example Learning Rates to 1e-4 (6304)
* Add rescale_betas_zero_snr Argument to DDPMScheduler (6305)
* Fix LCM distillation bug when creating the guidance scale embeddings using multiple GPUs. (6279)
* markkua
* [Community Pipeline] Add Marigold Monocular Depth Estimation (6249)

0.24.0

Stable Video Diffusion, SDXL Turbo, IP Adapters, Kandinsky 3.0

Stable Diffusion Video

[Stable Video Diffusion](https://huggingface.co/papers/2311.15127) is a powerful image-to-video generation model that can generate high resolution (576x1024) 2-4 seconds videos conditioned on the input image.

Image to Video Generation

There are two variants of SVD. [SVD](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid) and [SVD-XT](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt). The SVD checkpoint is trained to generate 14 frames and the SVD-XT checkpoint is further finetuned to generate 25 frames.

You need to condition the generation on an initial image, as follows:

python
import torch

from diffusers import StableVideoDiffusionPipeline
from diffusers.utils import load_image, export_to_video

pipe = StableVideoDiffusionPipeline.from_pretrained(
"stabilityai/stable-video-diffusion-img2vid-xt", torch_dtype=torch.float16, variant="fp16"
)
pipe.enable_model_cpu_offload()

Load the conditioning image
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/svd/rocket.png?download=true")
image = image.resize((1024, 576))

generator = torch.manual_seed(42)
frames = pipe(image, decode_chunk_size=8, generator=generator).frames[0]

export_to_video(frames, "generated.mp4", fps=7)

Since generating videos is more memory intensive, we can use the `decode_chunk_size` argument to control how many frames are decoded at once. This will reduce the memory usage. It's recommended to tweak this value based on your GPU memory. Setting `decode_chunk_size=1` will decode one frame at a time and will use the least amount of memory, but the video might have some flickering.

Additionally, we also use [model cpu offloading](https://huggingface.co/docs/diffusers/main/en/optimization/memory#model-offloading) to reduce the memory usage.

![rocket_generated](https://github.com/huggingface/diffusers/assets/23423619/b861e9cc-6550-4796-823f-a32ddf20b7c4)

SDXL Turbo

SDXL Turbo is an adversarial time-distilled [Stable Diffusion XL](https://huggingface.co/papers/2307.01952) (SDXL) model capable of running inference in as little as 1 step. Also, it does not use classifier-free guidance, further increasing its speed. On a good consumer GPU, you can now generate an image in just 100ms.

Text-to-Image

For text-to-image, pass a text prompt. By default, SDXL Turbo generates a 512x512 image, and that resolution gives the best results. You can try setting the `height` and `width` parameters to 768x768 or 1024x1024, but you should expect quality degradations when doing so.

Make sure to set `guidance_scale` to 0.0 to disable, as the model was trained without it. A single inference step is enough to generate high quality images.
Increasing the number of steps to 2, 3 or 4 should improve image quality.

py
from diffusers import AutoPipelineForText2Image
import torch

pipeline_text2image = AutoPipelineForText2Image.from_pretrained("stabilityai/sdxl-turbo", torch_dtype=torch.float16, variant="fp16")
pipeline_text2image = pipeline_text2image.to("cuda")

prompt = "A cinematic shot of a baby racoon wearing an intricate italian priest robe."

image = pipeline_text2image(prompt=prompt, guidance_scale=0.0, num_inference_steps=1).images[0]
image

<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/sdxl-turbo-text2img.png" alt="generated image of a racoon in a robe"/>
</div>

Image-to-image

For image-to-image generation, make sure that `num_inference_steps * strength` is larger or equal to 1.
The image-to-image pipeline will run for `int(num_inference_steps * strength)` steps, e.g. `0.5 * 2.0 = 1` step in
our example below.

py
from diffusers import AutoPipelineForImage2Image
from diffusers.utils import load_image, make_image_grid

use from_pipe to avoid consuming additional memory when loading a checkpoint
pipeline = AutoPipelineForImage2Image.from_pipe(pipeline_text2image).to("cuda")

init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png")
init_image = init_image.resize((512, 512))

prompt = "cat wizard, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney, 8k"

image = pipeline(prompt, image=init_image, strength=0.5, guidance_scale=0.0, num_inference_steps=2).images[0]
make_image_grid([init_image, image], rows=1, cols=2)

<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/sdxl-turbo-img2img.png" alt="Image-to-image generation sample using SDXL Turbo"/>
</div>

IP Adapters

[IP Adapters](https://github.com/tencent-ailab/IP-Adapter) have shown to be remarkably powerful at images conditioned on other images.

Thanks to okotaku, we have added IP adapters to the most important pipelines allowing you to combine them for a variety of different workflows, *e.g.* they work with Img2Img2, ControlNet, and LCM-LoRA out of the box.

LCM-LoRA

python
from diffusers import DiffusionPipeline, LCMScheduler
import torch
from diffusers.utils import load_image

model_id = "sd-dreambooth-library/herge-style"
lcm_lora_id = "latent-consistency/lcm-lora-sdv1-5"

pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)

pipe.load_ip_adapter("h94/IP-Adapter", subfolder="models", weight_name="ip-adapter_sd15.bin")
pipe.load_lora_weights(lcm_lora_id)
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()

prompt = "best quality, high quality"
image = load_image("https://user-images.githubusercontent.com/24734142/266492875-2d50d223-8475-44f0-a7c6-08b51cb53572.png")
images = pipe(
prompt=prompt,
ip_adapter_image=image,
num_inference_steps=4,
guidance_scale=1,
).images[0]

![yiyi_test_2_out](https://github.com/huggingface/diffusers/assets/12631849/9dc239c3-4483-46b9-a62b-b0c49c0c3b42)

ControlNet

py
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
import torch
from diffusers.utils import load_image

controlnet_model_path = "lllyasviel/control_v11f1p_sd15_depth"
controlnet = ControlNetModel.from_pretrained(controlnet_model_path, torch_dtype=torch.float16)

pipeline = StableDiffusionControlNetPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16)
pipeline.to("cuda")

image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/statue.png")
depth_map = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/depth.png")

pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="models", weight_name="ip-adapter_sd15.bin")

generator = torch.Generator(device="cpu").manual_seed(33)
images = pipeline(
prompt='best quality, high quality',
image=depth_map,
ip_adapter_image=image,
negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality",
num_inference_steps=50,
generator=generator,
).images
images[0].save("yiyi_test_2_out.png")

ip_image | condition | output |
:-------------------------:|:-------------------------:|:-------------------------:
![statue](https://github.com/huggingface/diffusers/assets/12631849/c680ea27-de29-44b8-b9bf-d4d51c56c6a8) | ![depth](https://github.com/huggingface/diffusers/assets/12631849/0a476a2a-ec20-4ea3-9cc8-b390e8eb5685) | ![yiyi_test_2_out](https://github.com/huggingface/diffusers/assets/12631849/10b0af13-09ca-4de0-a72a-e43ba51dd706)

For more information:
- :point_right: https://huggingface.co/docs/diffusers/main/en/using-diffusers/loading_adapters#ip-adapter

0.23.1

Small patch release to make sure the correct PEFT version is installed.

All commits
- Improve setup.py and add dependency check by patrickvonplaten in 5826

Page 6 of 16

Releases

Has known vulnerabilities

Previous Next

Diffusers

Page 6 of 16

0.26.1

0.26.0

0.25.1

0.25.0

0.24.0

0.23.1

Page 6 of 16

Links

Releases