Diffusers

Latest version: v0.32.2

Safety actively analyzes 723882 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 12 of 16

0.12.0

πŸͺ„ Instruct-Pix2Pix
Instruct-Pix2Pix is a Stable Diffusion model fine-tuned for editing images from human instructions. Given an input image and a written instruction that tells the model what to do, the model follows these instructions to edit the image.

![image](https://huggingface.co/datasets/diffusers/diffusers-images-docs/resolve/main/pix2pix.jpeg)

The model was released with the paper [InstructPix2Pix: Learning to Follow Image Editing Instructions](https://arxiv.org/abs/2211.09800). More information about the model can be found in the paper.


pip install diffusers transformers safetensors accelerate


python
import PIL
import requests
import torch
from diffusers import StableDiffusionInstructPix2PixPipeline

model_id = "timbrooks/instruct-pix2pix"
pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")

url = "https://huggingface.co/datasets/diffusers/diffusers-images-docs/resolve/main/mountain.png"
def download_image(url):
image = PIL.Image.open(requests.get(url, stream=True).raw)
image = PIL.ImageOps.exif_transpose(image)
image = image.convert("RGB")
return image
image = download_image(url)

prompt = "make the mountains snowy"
edit = pipe(prompt, image=image, num_inference_steps=20, image_guidance_scale=1.5, guidance_scale=7).images[0]
images[0].save("snowy_mountains.png")

* Add InstructPix2Pix pipeline by patil-suraj 2040


πŸ€– DiT

Diffusion Transformers (DiTs) is a class conditional latent diffusion model which replaces the commonly used U-Net backbone with a transformer operating on latent patches. The pretrained model is trained on the ImageNet-1K dataset and is able to generate class conditional images of 256x256 or 512x512 pixels.

![dit](https://user-images.githubusercontent.com/8100/214593099-3b478e53-64ca-4265-925c-50eb0ea5da3e.png)

The model was released with the paper [Scalable Diffusion Models with Transformers](https://www.wpeebles.com/DiT).

python
import torch
from diffusers import DiTPipeline

model_id = "facebook/DiT-XL-2-256"
pipe = DiTPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")

pick words that exist in ImageNet
words = ["white shark", "umbrella"]
class_ids = pipe.get_label_ids(words)

output = pipe(class_labels=class_ids)
image = output.images[0] label 'white shark'

⚑ LoRA

LoRA is a technique for performing parameter-efficient fine-tuning for large models. LoRA works by adding so-called "update matrices" to specific blocks of a pre-trained model. During fine-tuning, only these update matrices are updated while the pre-trained model parameters are kept frozen. This allows us to achieve greater memory efficiency as well as easier portability during fine-tuning.

LoRA was proposed in [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685). In the original paper, the authors investigated LoRA for fine-tuning large language models like GPT-3. [cloneofsimo](https://github.com/cloneofsimo) was the first to try out LoRA training for Stable Diffusion in the popular [lora](https://github.com/cloneofsimo/lora) GitHub repository.

Diffusers now supports [LoRA](https://arxiv.org/abs/2212.06727)! This means you can now fine-tune a model like Stable Diffusion using consumer GPUs like Tesla T4 or RTX 2080 Ti. LoRA support was added to [`UNet2DConditionModel`](https://huggingface.co/docs/diffusers/main/en/api/models#diffusers.UNet2DConditionModel) and DreamBooth training script by patrickvonplaten in 1884.

By using LoRA, the fine-tuned checkpoints will be **just 3 MBs in size**. After fine-tuning, you can use the LoRA checkpoints like so:

py
from diffusers import StableDiffusionPipeline
import torch

model_path = "sayakpaul/sd-model-finetuned-lora-t4"
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16)
pipe.unet.load_attn_procs(model_path)
pipe.to("cuda")

prompt = "A pokemon with blue eyes."
image = pipe(prompt, num_inference_steps=30, guidance_scale=7.5).images[0]
image.save("pokemon.png")


![pokemon-image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pokemon-collage.png)

You can follow these resources to know more about how to use LoRA in diffusers:

* [text2image fine-tuning script](https://github.com/huggingface/diffusers/tree/main/examples/text_to_image#training-with-lora) (by sayakpaul in 2031).
* [Official documentation discussing how LoRA is supported](https://huggingface.co/docs/diffusers/main/en/training/lora) (by sayakpaul in #2086).

πŸ“ Customizable Cross Attention

LoRA leverages a new method to customize the cross attention layers deep in the UNet. This can be useful for other creative approaches such as [Prompt-to-Prompt](https://arxiv.org/abs/2208.01626), and it makes it easier to apply optimizers like [xFormers](https://github.com/facebookresearch/xformers). This new "attention processor" abstraction was created by patrickvonplaten in #1639 after discussing the design with the community, and we have used it to rewrite our xFormers and attention slicing implementations!

🌿 Flax => PyTorch

A long requested feature, prolific community member camenduru took up the gauntlet in 1900 and created a way to convert Flax model weights for PyTorch. This means that you can train or fine-tune models super fast using Google TPUs, and then convert the weights to PyTorch for everybody to use. Thanks camenduru!

πŸŒ€ Flax Img2Img

Another community member, dhruvrnaik, ported the image-to-image pipeline to Flax in 1355! Using a TPU v2-8 (available in Colab's free tier), you can generate 8 images at once in a few seconds!

🎲 DEIS Scheduler
DEIS (Diffusion Exponential Integrator Sampler) is a new fast mult step scheduler that can generate high-quality samples in fewer steps.
The scheduler was introduced in the paper [Fast Sampling of Diffusion Models with Exponential Integrator](https://arxiv.org/abs/2204.13902). More information about the scheduler can be found in the paper.

python
from diffusers import StableDiffusionPipeline, DEISMultistepScheduler
import torch

pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
pipe.scheduler = DEISMultistepScheduler.from_config(pipe.scheduler.config)
pipe = pipe.to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"
generator = torch.Generator(device="cuda").manual_seed(0)
image = pipe(prompt, generator=generator, num_inference_steps=25).images[0

* feat : add log-rho deis multistep scheduler by qsh-zh 1432

Reproducibility

One can now pass CPU generators to all pipelines even if the pipeline is on GPU. This ensures
much better reproducibility across GPU hardware:

python
import torch
from diffusers import DDIMPipeline
import numpy as np

model_id = "google/ddpm-cifar10-32"

load model and scheduler
ddim = DDIMPipeline.from_pretrained(model_id)
ddim.to("cuda")

create a generator for reproducibility
generator = torch.manual_seed(0)

run pipeline for just two steps and return numpy tensor
image = ddim(num_inference_steps=2, output_type="np", generator=generator).images
print(np.abs(image).sum())


See: 1902 and https://huggingface.co/docs/diffusers/using-diffusers/reproducibility

Important New Guides

- Stable Diffusion 101: https://huggingface.co/docs/diffusers/stable_diffusion
- Reproducibility: https://huggingface.co/docs/diffusers/using-diffusers/reproducibility
- LoRA: https://huggingface.co/docs/diffusers/training/lora

Important Bug Fixes

- Don't download safetensors if library is not installed: 2057
- Make sure that `save_pretrained(...)` doesn't accidentally delete files: 2038
- Fix CPU offload docs for maximum memory gain: 1968
- Fix conversion for exotically sorted weight names: 1959
- Fix intermediate checkpointing for textual inversion, thanks lstein 2072

All commits

* update composable diffusion for an updated diffuser library by nanlliu in 1697
* [Tests] Fix UnCLIP cpu offload tests by anton-l in 1769
* Bump to 0.12.0.dev0 by anton-l in 1771
* [Dreambooth] flax fixes by pcuenca in 1765
* update train_unconditional_ort.py by prathikr in 1775
* Only test for xformers when enabling them 1773 by kig in 1776
* expose polynomial:power and cosine_with_restarts:num_cycles params by zetyquickly in 1737
* [Flax] Stateless schedulers, fixes and refactors by skirsten in 1661
* Correct hf hub download by patrickvonplaten in 1767
* Dreambooth docs: minor fixes by pcuenca in 1758
* Fix num images per prompt unclip by patil-suraj in 1787
* Add Flax stable diffusion img2img pipeline by dhruvrnaik in 1355
* Refactor cross attention and allow mechanism to tweak cross attention function by patrickvonplaten in 1639
* Fix OOM when using PyTorch with JAX installed. by pcuenca in 1795
* reorder model wrap + bug fix by prathikr in 1799
* Remove hardcoded names from PT scripts by patrickvonplaten in 1778
* [textual_inversion] unwrap_model text encoder before accessing weights by patil-suraj in 1816
* fix small mistake in annotation: 32 -> 64 by Line290 in 1780
* Make safety_checker optional in more pipelines by pcuenca in 1796
* Device to use (e.g. cpu, cuda:0, cuda:1, etc.) by camenduru in 1844
* Avoid duplicating PyTorch + safetensors downloads. by pcuenca in 1836
* Width was typod as weight by Helw150 in 1800
* fix: resize transform now preserves aspect ratio by parlance-zz in 1804
* Make xformers optional even if it is available by kn in 1753
* Allow selecting precision to make Dreambooth class images by kabachuha in 1832
* unCLIP image variation by williamberman in 1781
* [Community Pipeline] MagicMix by daspartho in 1839
* [Versatile Diffusion] Fix cross_attention_kwargs by patrickvonplaten in 1849
* [Dtype] Align dtype casting behavior with Transformers and Accelerate by patrickvonplaten in 1725
* [StableDiffusionInpaint] Correct test by patrickvonplaten in 1859
* [textual inversion] add gradient checkpointing and small fixes. by patil-suraj in 1848
* Flax: Fix img2img and align with other pipeline by skirsten in 1824
* Make repo structure consistent by patrickvonplaten in 1862
* [Unclip] Make sure text_embeddings & image_embeddings can directly be passed to enable interpolation tasks. by patrickvonplaten in 1858
* Fix ema decay by pcuenca in 1868
* [Docs] Improve docs by patrickvonplaten in 1870
* [examples] update loss computation by patil-suraj in 1861
* [train_text_to_image] allow using non-ema weights for training by patil-suraj in 1834
* [Attention] Finish refactor attention file by patrickvonplaten in 1879
* Fix typo in train_dreambooth_inpaint by pcuenca in 1885
* Update ONNX Pipelines to use np.float64 instead of np.float by agizmo in 1789
* [examples] misc fixes by patil-suraj in 1886
* Fixes to the help for `report_to` in training scripts by pcuenca in 1888
* updated doc for stable diffusion pipelines by yiyixuxu in 1770
* Add UnCLIPImageVariationPipeline to dummy imports by anton-l in 1897
* Add accelerate and xformers versions to `diffusers-cli env` by anton-l in 1898
* [addresses issue 1642] add add_noise to scheduling-sde-ve by aengusng8 in 1827
* Add condtional generation to AudioDiffusionPipeline by teticio in 1826
* Fixes in comments in SD2 D2I by neverix in 1903
* [Deterministic torch randn] Allow tensors to be generated on CPU by patrickvonplaten in 1902
* [Docs] Remove duplicated API doc string by patrickvonplaten in 1901
* fix: DDPMScheduler.set_timesteps() by Joqsan in 1912
* Fix --resume_from_checkpoint step in train_text_to_image.py by merfnad in 1914
* Support training SD V2 with Flax by yasyf in 1783
* Fix lr-scaling store_true & default=True cli argument for textual_inversion training. by aredden in 1090
* Various Fixes for Flax Dreambooth by yasyf in 1782
* Test ResnetBlock2D by hchings in 1850
* Init for korean docs by seriousran in 1910
* New Pipeline: Tiled-upscaling with depth perception to avoid blurry spots by peterwilli in 1615
* Improve reproduceability 2/3 by patrickvonplaten in 1906
* feat : add log-rho deis multistep scheduler by qsh-zh in 1432
* Feature/colossalai by Fazziekey in 1793
* [Docs] Add TRANSLATING.md file by seriousran in 1920
* [StableDiffusionimg2img] validating input type by Shubhamai in 1913
* [dreambooth] low precision guard by williamberman in 1916
* [Stable Diffusion Guide] 101 Stable Diffusion Guide directly into the docs by patrickvonplaten in 1927
* [Conversion] Make sure ema weights are extracted correctly by patrickvonplaten in 1937
* fix path to logo by vvssttkk in 1939
* Add automatic doc sorting by patrickvonplaten in 1940
* update to latest colossalai by Fazziekey in 1951
* fix typo in imagic_stable_diffusion.py by andreemic in 1956
* [Conversion SD] Make sure weirdly sorted keys work as well by patrickvonplaten in 1959
* allow loading ddpm models into ddim by patrickvonplaten in 1932
* [Community] Correct checkpoint merger by patrickvonplaten in 1965
* Update CLIPGuidedStableDiffusion.feature_extractor.size to fix TypeError by oxidase in 1938
* [CPU offload] correct cpu offload by patrickvonplaten in 1968
* [Docs] Update README.md by haofanwang in 1960
* Research project multi subject dreambooth by klopsahlong in 1948
* Example tests by patrickvonplaten in 1982
* Fix slow tests by patrickvonplaten in 1983
* Fix unused upcast_attn flag in convert_original_stable_diffusion_to_diffusers script by kn in 1942
* Allow converting Flax to PyTorch by adding a "from_flax" keyword by camenduru in 1900
* Update docstring by Warvito in 1971
* [SD Img2Img] resize source images to multiple of 8 instead of 32 by vvsotnikov in 1571
* Update README.md to include our blog post by sayakpaul in 1998
* Fix a couple typos in Dreambooth readme by pcuenca in 2004
* Add tests for 2D UNet blocks by hchings in 1945
* [Conversion] Support convert diffusers to safetensors by hua1995116 in 1996
* [Community] Fix merger by patrickvonplaten in 2006
* [Conversion] Improve safetensors by patrickvonplaten in 1989
* [Black] Update black library by patrickvonplaten in 2007
* Fix typos in ColossalAI example by haofanwang in 2001
* Use pipeline tests mixin for UnCLIP pipeline tests + unCLIP MPS fixes by williamberman in 1908
* Change PNDMPipeline to use PNDMScheduler by willdalh in 2003
* [train_unconditional] fix LR scheduler init by patil-suraj in 2010
* [Docs] No more autocast by patrickvonplaten in 2021
* [Flax] Add Flax inpainting impl by xvjiarui in 1966
* Check k-diffusion version is at least 0.0.12 by pcuenca in 2022
* DiT Pipeline by kashif in 1806
* fix dit doc header by patil-suraj in 2027
* [LoRA] Add LoRA training script by patrickvonplaten in 1884
* [Dit] Fix dit tests by patrickvonplaten in 2034
* Fix typos and minor redundancies by Joqsan in 2029
* [Lora] Model card by patrickvonplaten in 2032
* [Save Pretrained] Remove dead code lines that can accidentally remove pytorch files by patrickvonplaten in 2038
* Fix EMA for multi-gpu training in the unconditional example by anton-l in 1930
* Minor fix in the documentation of LoRA by hysts in 2045
* Add InstructPix2Pix pipeline by patil-suraj in 2040
* Create repo before cloning in examples by Wauplin in 2047
* Remove modelcards dependency by Wauplin in 2050
* Module-ise "original stable diffusion to diffusers" conversion script by damian0815 in 2019
* [StableDiffusionInstructPix2Pix] use cpu generator in slow tests by patil-suraj in 2051
* [From pretrained] Don't download .safetensors files if safetensors is… by patrickvonplaten in 2057
* Correct Pix2Pix example by patrickvonplaten in 2056
* add community pipeline: StableUnCLIPPipeline by budui in 2037
* [LoRA] Adds example on text2image fine-tuning with LoRA by sayakpaul in 2031
* Safetensors loading in "convert_diffusers_to_original_stable_diffusion" by cafeai in 2054
* [examples] add dataloader_num_workers argument by patil-suraj in 2070
* Dreambooth: reduce VRAM usage by gleb-akhmerov in 2039
* [Paint by example] Fix cpu offload for paint by example by patrickvonplaten in 2062
* [textual_inversion] Fix resuming state when using gradient checkpointing by pcuenca in 2072
* [lora] Log images when using tensorboard by pcuenca in 2078
* Fix resume epoch for all training scripts except textual_inversion by pcuenca in 2079
* [dreambooth] fix multi on gpu. by patil-suraj in 2088
* Run inference on a specific condition and fix call of manual_seed() by shirayu in 2074
* [Feat] checkpoint_merger works on local models as well as ones that use safetensors by lstein in 2060
* xFormers attention op arg by takuma104 in 2049
* [docs] [dreambooth] note random crop by williamberman in 2085
* Remove wandb from text_to_image requirements.txt by pcuenca in 2092
* [doc] update example for pix2pix by patil-suraj in 2101
* Add `lora` tag to the model tags by apolinario in 2103
* [docs] Adds a doc on LoRA support for diffusers by sayakpaul in 2086
* Allow directly passing text embeddings to Stable Diffusion Pipeline for prompt weighting by patrickvonplaten in 2071
* Improve transformers versions handling by patrickvonplaten in 2104
* Reproducibility 3/3 by patrickvonplaten in 1924

πŸ™Œ Significant community contributions πŸ™Œ

The following contributors have made significant changes to the library over the last release:

* nanlliu
* update composable diffusion for an updated diffuser library (1697)
* skirsten
* [Flax] Stateless schedulers, fixes and refactors (1661)
* Flax: Fix img2img and align with other pipeline (1824)
* hchings
* Test ResnetBlock2D (1850)
* Add tests for 2D UNet blocks (1945)
* seriousran
* Init for korean docs (1910)
* [Docs] Add TRANSLATING.md file (1920)
* qsh-zh
* feat : add log-rho deis multistep scheduler (1432)
* Fazziekey
* Feature/colossalai (1793)
* update to latest colossalai (1951)
* klopsahlong
* Research project multi subject dreambooth (1948)
* xvjiarui
* [Flax] Add Flax inpainting impl (1966)
* damian0815
* Module-ise "original stable diffusion to diffusers" conversion script (2019)
* camenduru
* Allow converting Flax to PyTorch by adding a "from_flax" keyword (1900)

0.11.1

This patch release fixes a bug with `num_images_per_prompt` in the `UnCLIPPipeline`
* Fix num images per prompt unclip by patil-suraj in 1787

0.11.0

:magic_wand: Karlo UnCLIP by Kakao Brain

Karlo is a text-conditional image generation model based on OpenAI's unCLIP architecture with the improvement over the standard super-resolution model from 64px to 256px, recovering high-frequency details in a small number of denoising steps.

This alpha version of Karlo is trained on 115M image-text pairs, including [COYO](https://github.com/kakaobrain/coyo-dataset)-100M high-quality subset, CC3M, and CC12M.
For more information about the architecture, see the Karlo repository: https://github.com/kakaobrain/karlo
![image](https://user-images.githubusercontent.com/26864830/208464171-a46be794-ca3c-4d39-80ab-cee71402f0f0.png)


pip install diffusers transformers safetensors accelerate


python
import torch
from diffusers import UnCLIPPipeline

pipe = UnCLIPPipeline.from_pretrained("kakaobrain/karlo-v1-alpha", torch_dtype=torch.float16)
pipe = pipe.to("cuda")

prompt = "a high-resolution photograph of a big red frog on a green leaf."
image = pipe(prompt).images[0]


![img](https://huggingface.co/datasets/patrickvonplaten/images/resolve/main/frog.png)

:octocat: Community pipeline versioning

The community pipelines hosted in `diffusers/examples/community` will now follow the installed version of the library.

E.g. if you have `diffusers==0.9.0` installed, the pipelines from the `v0.9.0` branch will be used: https://github.com/huggingface/diffusers/tree/v0.9.0/examples/community

If you've installed diffusers from source, e.g. with `pip install git+https://github.com/huggingface/diffusers` then the latest versions of the pipelines will be fetched from the `main` branch.

To change the custom pipeline version, set the `custom_revision` variable like so:
python
pipeline = DiffusionPipeline.from_pretrained(
"google/ddpm-cifar10-32", custom_pipeline="one_step_unet", custom_revision="0.10.2"
)


:safety_vest: safetensors

Many of the most important checkpoints now have [https://github.com/huggingface/safetensors](https://github.com/huggingface/safetensors) available. Upon installing `safetensors` with:


pip install safetensors


You will see a nice speed-up when loading your model :rocket:

Some of the most improtant checkpoints have safetensor weights added now:
- https://huggingface.co/stabilityai/stable-diffusion-2
- https://huggingface.co/stabilityai/stable-diffusion-2-1
- https://huggingface.co/stabilityai/stable-diffusion-2-depth
- https://huggingface.co/stabilityai/stable-diffusion-2-inpainting

Batched generation bug fixes :bug:

* Make sure all pipelines can run with batched input by patrickvonplaten in 1669

We fixed a lot of bugs for batched generation. All pipelines should now correctly process batches of prompts and images :hugs:
Also we made it much easier to tweak images with reproducible seeds:
https://huggingface.co/docs/diffusers/using-diffusers/reusing_seeds

:memo: Changelog
* Remove spurious arg in training scripts by pcuenca in 1644
* dreambooth: fix 1566: maintain fp32 wrapper when saving a checkpoint to avoid crash when running fp16 by timh in 1618
* Allow k pipeline to generate > 1 images by pcuenca in 1645
* Remove unnecessary offset in img2img by patrickvonplaten in 1653
* Remove unnecessary kwargs in depth2img by maruel in 1648
* Add text encoder conversion by lawfordp2017 in 1559
* VersatileDiffusion: fix input processing by LukasStruppek in 1568
* tensor format ort bug fix by prathikr in 1557
* Deprecate init image correctly by patrickvonplaten in 1649
* fix bug if we don't do_classifier_free_guidance by MKFMIKU in 1601
* Handle missing global_step key in scripts/convert_original_stable_diffusion_to_diffusers.py by Cyberes in 1612
* [SD] Make sure scheduler is correct when converting by patrickvonplaten in 1667
* [Textual Inversion] Do not update other embeddings by patrickvonplaten in 1665
* Added Community pipeline for comparing Stable Diffusion v1.1-4 checkpoints by suvadityamuk in 1584
* Fix wrong type checking in `convert_diffusers_to_original_stable_diffusion.py` by apolinario in 1681
* [Version] Bump to 0.11.0.dev0 by patrickvonplaten in 1682
* Dreambooth: save / restore training state by pcuenca in 1668
* Disable telemetry when DISABLE_TELEMETRY is set by w4ffl35 in 1686
* Change one-step dummy pipeline for testing by patrickvonplaten in 1690
* [Community pipeline] Add github mechanism by patrickvonplaten in 1680
* Dreambooth: use warnings instead of logger in parse_args() by pcuenca in 1688
* manually update train_unconditional_ort by prathikr in 1694
* Remove all local telemetry by anton-l in 1702
* Update main docs by patrickvonplaten in 1706
* [Readme] Clarify package owners by anton-l in 1707
* Fix the bug that torch version less than 1.12 throws TypeError by chinoll in 1671
* RePaint fast tests and API conforming by anton-l in 1701
* Add state checkpointing to other training scripts by pcuenca in 1687
* Improve pipeline_stable_diffusion_inpaint_legacy.py by cyber-meow in 1585
* apply amp bf16 on textual inversion by jiqing-feng in 1465
* Add examples with Intel optimizations by hshen14 in 1579
* Added a README page for docs and a "schedulers" page by yiyixuxu in 1710
* Accept latents as optional input in Latent Diffusion pipeline by daspartho in 1723
* Fix ONNX img2img preprocessing and add fast tests coverage by anton-l in 1727
* Fix ldm tests on master by not running the CPU tests on GPU by patrickvonplaten in 1729
* Docs: recommend xformers by pcuenca in 1724
* Nightly integration tests by anton-l in 1664
* [Batched Generators] This PR adds generators that are useful to make batched generation fully reproducible by patrickvonplaten in 1718
* Fix ONNX img2img preprocessing by peterto in 1736
* Fix MPS fast test warnings by anton-l in 1744
* Fix/update the LDM pipeline and tests by anton-l in 1743
* kakaobrain unCLIP by williamberman in 1428
* [fix] pipeline_unclip generator by williamberman in 1751
* unCLIP docs by williamberman in 1754
* Correct help text for scheduler_type flag in scripts. by msiedlarek in 1749
* Add resnet_time_scale_shift to VD layers by anton-l in 1757
* Add attention mask to uclip by patrickvonplaten in 1756
* Support attn2==None for xformers by anton-l in 1759
* [UnCLIPPipeline] fix num_images_per_prompt by patil-suraj in 1762
* Add CPU offloading to UnCLIP by anton-l in 1761
* [Versatile] fix attention mask by patrickvonplaten in 1763
* [Revision] Don't recommend using revision by patrickvonplaten in 1764
* [Examples] Update train_unconditional.py to include logging argument for Wandb by ash0ts in 1719
* Transformers version req for UnCLIP by anton-l in 1766

0.10.2

This patch removes the hard requirement for `transformers>=4.25.1` in case external libraries were downgrading the library upon startup in a non-controllable way.

* do not automatically enable xformers by patrickvonplaten in 1640
* Adapt to forced transformers version in some dependent libraries by anton-l in 1638
* Re-add xformers enable to UNet2DCondition by patrickvonplaten in 1627

🚨🚨🚨 **Note that xformers in not automatically enabled anymore** 🚨🚨🚨

The reasons for this are given here: https://github.com/huggingface/diffusers/pull/1640#discussion_r1044651551:

> We should not automatically enable xformers for three reasons:
>
> It's not PyTorch-like API. PyTorch doesn't by default enable all the fastest options available
> We allocate GPU memory before the user even does .to("cuda")
> This behavior is not consistent with cases where xformers is not installed

**=> This means**: If you were used to have xformers automatically enabled, please make sure to add the following now:

python
from diffusers.utils.import_utils import is_xformers_available

unet = ... load unet

if is_xformers_available():
try:
unet.enable_xformers_memory_efficient_attention(True)
except Exception as e:
logger.warning(
"Could not enable memory efficient attention. Make sure xformers is installed"
f" correctly and a GPU is available: {e}"
)


for the UNet (e.g. in dreambooth) or for the pipeline:

py
from diffusers.utils.import_utils import is_xformers_available

pipe = ... load pipeline

if is_xformers_available():
try:
pipe.enable_xformers_memory_efficient_attention(True)
except Exception as e:
logger.warning(
"Could not enable memory efficient attention. Make sure xformers is installed"
f" correctly and a GPU is available: {e}"
)

0.10.1

This patch returns `enable_xformers_memory_efficient_attention()` to `UNet2DCondition` to restore backward compatibility.

* Re-add xformers enable to UNet2DCondition by patrickvonplaten in 1627

0.10.0

🐳 Depth-Guided Stable Diffusion and 2.1 checkpoints

The new depth-guided stable diffusion model is fully supported in this release. The model is conditioned on monocular depth estimates inferred via [MiDaS](https://github.com/isl-org/MiDaS) and can be used for structure-preserving img2img and shape-conditional synthesis.

![image](https://user-images.githubusercontent.com/26864830/206480602-d0b0969b-3e4a-4c33-a1d0-40fe5b877656.png)

Installing the `transformers` library from source is required for the MiDaS model:
bash
pip install --upgrade git+https://github.com/huggingface/transformers/

python
import torch
import requests
from PIL import Image
from diffusers import StableDiffusionDepth2ImgPipeline

pipe = StableDiffusionDepth2ImgPipeline.from_pretrained(
"stabilityai/stable-diffusion-2-depth",
torch_dtype=torch.float16,
).to("cuda")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
init_image = Image.open(requests.get(url, stream=True).raw)

prompt = "two tigers"
n_propmt = "bad, deformed, ugly, bad anotomy"
image = pipe(prompt=prompt, image=init_image, negative_prompt=n_propmt, strength=0.7).images[0]


The updated Stable Diffusion 2.1 checkpoints are also released and fully supported:
* https://huggingface.co/stabilityai/stable-diffusion-2-1
* https://huggingface.co/stabilityai/stable-diffusion-2-1-base

:safety_vest: Safe Tensors
We now support [SafeTensors](https://github.com/huggingface/safetensors/): a new simple format for storing tensors safely (as opposed to pickle) that is still fast (zero-copy).
* [Proposal] Support loading from safetensors if file is present. by Narsil in 1357
* [Proposal] Support saving to safetensors by MatthieuBizien in 1494

| Format | Safe | Zero-copy | Lazy loading | No file size limit | Layout control | Flexibility | Bfloat16
| ----------------------- | --- | --- | --- | --- | --- | --- | --- |
| pickle (PyTorch) | βœ— | βœ— | βœ— | βœ“ | βœ— | βœ“ | βœ“ |
| H5 (Tensorflow) | βœ“ | βœ— | βœ“ | βœ“ | ~ | ~ | βœ— |
| SavedModel (Tensorflow) | βœ“ | βœ— | βœ— | βœ“ | βœ“ | βœ— | βœ“ |
| MsgPack (flax) | βœ“ | βœ“ | βœ— | βœ“ | βœ— | βœ— | βœ“ |
| SafeTensors | βœ“ | βœ“ | βœ“ | βœ“ | βœ“ | βœ— | βœ“ |

**More details about the comparison here: https://github.com/huggingface/safetensors#yet-another-format-

pip install safetensors

python
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1")
pipe.save_pretrained("./safe-stable-diffusion-2-1", safe_serialization=True)

you can also push this checkpoint to the HF Hub and load from there
safe_pipe = StableDiffusionPipeline.from_pretrained("./safe-stable-diffusion-2-1")


New Pipelines
:paintbrush: Paint-by-example
An implementation of [Paint by Example: Exemplar-based Image Editing with Diffusion Models](https://arxiv.org/abs/2211.13227) by Binxin Yang, Shuyang Gu, Bo Zhang, Ting Zhang, Xuejin Chen, Xiaoyan Sun, Dong Chen, Fang Wen
* Add paint by example by patrickvonplaten in 1533

![image](https://user-images.githubusercontent.com/26864830/206482481-ce91ca9d-1cf3-441a-9dd5-f012f3160431.png)

python
import PIL
import requests
import torch
from io import BytesIO
from diffusers import DiffusionPipeline

def download_image(url):
response = requests.get(url)
return PIL.Image.open(BytesIO(response.content)).convert("RGB")

img_url = "https://raw.githubusercontent.com/Fantasy-Studio/Paint-by-Example/main/examples/image/example_1.png"
mask_url = "https://raw.githubusercontent.com/Fantasy-Studio/Paint-by-Example/main/examples/mask/example_1.png"
example_url = "https://raw.githubusercontent.com/Fantasy-Studio/Paint-by-Example/main/examples/reference/example_1.jpg"

init_image = download_image(img_url).resize((512, 512))
mask_image = download_image(mask_url).resize((512, 512))
example_image = download_image(example_url).resize((512, 512))

pipe = DiffusionPipeline.from_pretrained("Fantasy-Studio/Paint-by-Example", torch_dtype=torch.float16)
pipe = pipe.to("cuda")

image = pipe(image=init_image, mask_image=mask_image, example_image=example_image).images[0]


Audio Diffusion and Latent Audio Diffusion
Audio Diffusion leverages the recent advances in image generation using diffusion models by converting audio samples to and from mel spectrogram images.
* add AudioDiffusionPipeline and LatentAudioDiffusionPipeline 1334 by teticio in 1426
python
from IPython.display import Audio
from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained("teticio/audio-diffusion-ddim-256").to("cuda")

output = pipe()
display(output.images[0])
display(Audio(output.audios[0], rate=pipe.mel.get_sample_rate()))


[Experimental] K-Diffusion pipeline for Stable Diffusion
This pipeline is added to support the latest schedulers from crowsonkb's [k-diffusion](https://github.com/crowsonkb/k-diffusion)
The purpose of this pipeline is to compare scheduler implementations and updates, so new features from other pipelines are unlikely to be supported!

* [K Diffusion] Add k diffusion sampler natively by patrickvonplaten in 1603

pip install k-diffusion

python
from diffusers import StableDiffusionKDiffusionPipeline
import torch

pipe = StableDiffusionKDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1-base")
pipe = pipe.to("cuda")

pipe.set_scheduler("sample_heun")
image = pipe("astronaut riding horse", num_inference_steps=25).images[0]



New Schedulers
Heun scheduler inspired by Karras et. al
Algorithm 1 of [Karras et. al](https://arxiv.org/abs/2206.00364). Scheduler ported from crowsonkb’s [k-diffusion](https://github.com/crowsonkb/k-diffusion)

* Add 2nd order heun scheduler by patrickvonplaten in 1336
python
from diffusers import HeunDiscreteScheduler

pipe = StableDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1")
pipe.scheduler = HeunDiscreteScheduler.from_config(pipe.scheduler.config)


Single step DPM-Solver
Original paper can be found [here](https://arxiv.org/abs/2206.00927) and the [improved version](https://arxiv.org/abs/2211.01095). The original implementation can be found [here](https://github.com/LuChengTHU/dpm-solver).
* Add Singlestep DPM-Solver (singlestep high-order schedulers) by LuChengTHU in 1442
python
from diffusers import DPMSolverSinglestepScheduler

pipe = StableDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1")
pipe.scheduler = DPMSolverSinglestepScheduler.from_config(pipe.scheduler.config)



:memo: Changelog
* [Proposal] Support loading from safetensors if file is present. by Narsil in 1357
* Hotfix for AttributeErrors in OnnxStableDiffusionInpaintPipelineLegacy by anton-l in 1448
* Speed up test and remove kwargs from call by patrickvonplaten in 1446
* v-prediction training support by patil-suraj in 1455
* Fix Flax `from_pt` by pcuenca in 1436
* Ensure Flax pipeline always returns numpy array by pcuenca in 1435
* Add 2nd order heun scheduler by patrickvonplaten in 1336
* fix slow tests by patrickvonplaten in 1467
* Flax support for Stable Diffusion 2 by pcuenca in 1423
* Updates Image to Image Inpainting community pipeline README by vvvm23 in 1370
* StableDiffusion: Decode latents separately to run larger batches by kig in 1150
* Fix bug in half precision for DPMSolverMultistepScheduler by rtaori in 1349
* [Train unconditional] Unwrap model before EMA by anton-l in 1469
* Add `ort_nightly_directml` to the `onnxruntime` candidates by anton-l in 1458
* Allow saving trained betas by patrickvonplaten in 1468
* Fix dtype model loading by patrickvonplaten in 1449
* [Dreambooth] Make compatible with alt diffusion by patrickvonplaten in 1470
* Add better docs xformers by patrickvonplaten in 1487
* Remove reminder comment by pcuenca in 1489
* Bump to 0.10.0.dev0 + deprecations by anton-l in 1490
* Add doc for Stable Diffusion on Habana Gaudi by regisss in 1496
* Replace deprecated hub utils in `train_unconditional_ort` by anton-l in 1504
* [Deprecate] Correct stacklevel by patrickvonplaten in 1483
* simplyfy AttentionBlock by patil-suraj in 1492
* Standardize on using `image` argument in all pipelines by fboulnois in 1361
* support v prediction in other schedulers by patil-suraj in 1505
* Fix Flax flip_sin_to_cos by akashgokul in 1369
* Add an explicit `--image_size` to the conversion script by anton-l in 1509
* fix heun scheduler by patil-suraj in 1512
* [docs] [dreambooth training] accelerate.utils.write_basic_config by williamberman in 1513
* [docs] [dreambooth training] num_class_images clarification by williamberman in 1508
* [From pretrained] Allow returning local path by patrickvonplaten in 1450
* Update conversion script to correctly handle SD 2 by patrickvonplaten in 1511
* [refactor] Making the xformers mem-efficient attention activation recursive by blefaudeux in 1493
* Do not use torch.long in mps by pcuenca in 1488
* Fix Imagic example by dhruvrnaik in 1520
* Fix training docs to install datasets by pedrogengo in 1476
* Finalize 2nd order schedulers by patrickvonplaten in 1503
* Fixed mask+masked_image in sd inpaint pipeline by antoche in 1516
* Create train_dreambooth_inpaint.py by thedarkzeno in 1091
* Update FlaxLMSDiscreteScheduler by dzlab in 1474
* [Proposal] Support saving to safetensors by MatthieuBizien in 1494
* Add xformers attention to VAE by kig in 1507
* [CI] Add slow MPS tests by anton-l in 1104
* [Stable Diffusion Inpaint] Allow tensor as input image & mask by patrickvonplaten in 1527
* Compute embedding distances with torch.cdist by blefaudeux in 1459
* [Upscaling] Fix batch size by patrickvonplaten in 1525
* Update bug-report.yml by patrickvonplaten in 1548
* [Community Pipeline] Checkpoint Merger based on Automatic1111 by Abhinay1997 in 1472
* [textual_inversion] Add an option for only saving the embeddings by allo- in 781
* [examples] use from_pretrained to load scheduler by patil-suraj in 1549
* fix mask discrepancies in train_dreambooth_inpaint by thedarkzeno in 1529
* [refactor] make set_attention_slice recursive by patil-suraj in 1532
* Research folder by patrickvonplaten in 1553
* add AudioDiffusionPipeline and LatentAudioDiffusionPipeline 1334 by teticio in 1426
* [Community download] Fix cache dir by patrickvonplaten in 1555
* [Docs] Correct docs by patrickvonplaten in 1554
* Fix typo by pcuenca in 1558
* [docs] [dreambooth training] default accelerate config by williamberman in 1564
* Mega community pipeline by patrickvonplaten in 1561
* [examples] add check_min_version by patil-suraj in 1550
* [dreambooth] make collate_fn global by patil-suraj in 1547
* Standardize fast pipeline tests with PipelineTestMixin by anton-l in 1526
* Add paint by example by patrickvonplaten in 1533
* [Community Pipeline] fix lpw_stable_diffusion by SkyTNT in 1570
* [Paint by Example] Better default for image width by patrickvonplaten in 1587
* Add from_pretrained telemetry by anton-l in 1461
* Correct order height & width in pipeline_paint_by_example.py by Fantasy-Studio in 1589
* Fix common tests for FP16 by anton-l in 1588
* [UNet2DConditionModel] add an option to upcast attention to fp32 by patil-suraj in 1590
* Flax: avoid recompilation when params change by pcuenca in 1096
* Add Singlestep DPM-Solver (singlestep high-order schedulers) by LuChengTHU in 1442
* fix upcast in slice attention by patil-suraj in 1591
* Update scheduling_repaint.py by Randolph-zeng in 1582
* Update RL docs for better sharing / adding models by natolambert in 1563
* Make cross-attention check more robust by pcuenca in 1560
* [ONNX] Fix flaky tests by anton-l in 1593
* Trivial fix for undefined symbol in train_dreambooth.py by bcsherma in 1598
* [K Diffusion] Add k diffusion sampler natively by patrickvonplaten in 1603
* [Versatile Diffusion] add upcast_attention by patil-suraj in 1605
* Fix PyCharm/VSCode static type checking for dummy objects by anton-l in 1596

Page 12 of 16

Β© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.