Taking Diffusers Beyond Image Generation
We are very excited about this release! It brings new pipelines for video and audio to `diffusers`, showing that diffusion is a great choice for all sorts of generative tasks. The modular, pluggable approach of `diffusers` was crucial to integrate the new models intuitively and cohesively with the rest of the library. We hope you appreciate the consistency of the APIs and implementations, as our ultimate goal is to provide the best toolbox to help you solve the tasks you're interested in. Don't hesitate to get in touch if you use `diffusers` for other projects!
In addition to that, `diffusers 0.15` includes a lot of new features and improvements. From performance and deployment improvements (faster pipeline loading) to increased flexibility for creative tasks (Karras sigmas, weight prompting, support for Automatic1111 textual inversion embeddings) to additional customization options (Multi-ControlNet) to training utilities (ControlNet, Min-SNR weighting). Read on for the details!
๐ฌ Text-to-Video
Text-guided video generation is not a fantasy anymore - it's as simple as spinning up a colab and running any of the two powerful open-sourced video generation models.
Text-to-Video
[Alibaba's DAMO Vision Intelligence Lab](https://huggingface.co/damo-vilab) has open-sourced a first research-only video generation model that can generatae some powerful video clips of up to a minute. To see Darth Vader riding a wave simply copy-paste the following lines into your favorite Python interpreter:
py
import torch
from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler
from diffusers.utils import export_to_video
pipe = DiffusionPipeline.from_pretrained("damo-vilab/text-to-video-ms-1.7b", torch_dtype=torch.float16, variant="fp16")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()
prompt = "Spiderman is surfing"
video_frames = pipe(prompt, num_inference_steps=25).frames
video_path = export_to_video(video_frames)
![vader](https://user-images.githubusercontent.com/23423619/231514766-55a5921f-28de-4e30-b238-33d62086f083.gif)
For more information you can have a look at ["damo-vilab/text-to-video-ms-1.7b"](https://huggingface.co/damo-vilab/text-to-video-ms-1.7b)
Text-to-Video Zero
Text2Video-Zero is a zero-shot text-to-video synthesis diffusion model that enables low cost yet consistent video generation with only pre-trained text-to-image diffusion models using simple pre-trained stable diffusion models, such as Stable Diffusion v1-5. Text2Video-Zero also naturally supports cool extension works of pre-trained text-to-image models such as Instruct Pix2Pix, ControlNet and DreamBooth, and based on which we present Video Instruct Pix2Pix, Pose Conditional, Edge Conditional and, Edge Conditional and DreamBooth Specialized applications.
https://user-images.githubusercontent.com/23423619/231516176-813133f9-1216-4845-8b49-4e062610f12c.mp4
For more information please have a look at [PAIR/Text2Video-Zero](https://huggingface.co/spaces/PAIR/Text2Video-Zero)
๐ Audio Generation
Text-guided audio generation has made great progress over the last months with many advances being based on diffusion models.
The 0.15.0 release includes two powerful audio diffusion models.
AudioLDM
Inspired by [Stable Diffusion](https://huggingface.co/CompVis/stable-diffusion-v1-4), AudioLDM
is a text-to-audio _latent diffusion model (LDM)_ that learns continuous audio representations from [CLAP](https://huggingface.co/laion/clap-htsat-unfused)
latents. AudioLDM takes a text prompt as input and predicts the corresponding audio. It can generate text-conditional
sound effects, human speech and music.
python
from diffusers import AudioLDMPipeline
import torch
repo_id = "cvssp/audioldm"
pipe = AudioLDMPipeline.from_pretrained(repo_id, torch_dtype=torch.float16)
pipe = pipe.to("cuda")
prompt = "Techno music with a strong, upbeat tempo and high melodic riffs"
audio = pipe(prompt, num_inference_steps=10, audio_length_in_s=5.0).audios[0]
The resulting audio output can be saved as a .wav file:
python
import scipy
scipy.io.wavfile.write("techno.wav", rate=16000, data=audio)
For more information see [cvssp/audioldm](https://huggingface.co/cvssp/audioldm)
Spectrogram Diffusion
This model from the [Magenta team](https://github.com/magenta) is a MIDI to audio generator. The pipeline takes a MIDI file as input and autoregressively generates 5-sec spectrograms which are concated together in the end and decoded to audio via a Spectrogram decoder.
python
from diffusers import SpectrogramDiffusionPipeline, MidiProcessor
pipe = SpectrogramDiffusionPipeline.from_pretrained("google/music-spectrogram-diffusion")
pipe = pipe.to("cuda")
processor = MidiProcessor()
Download MIDI from: wget http://www.piano-midi.de/midis/beethoven/beethoven_hammerklavier_2.mid
output = pipe(processor("beethoven_hammerklavier_2.mid"))
audio = output.audios[0]
๐ New Docs
Documentation is crucially important for `diffusers`, as it's one of the first resources where people try to understand how everything works and fix any issues they are observing. We have spent a lot of time in this release reviewing all documents, adding new ones, reorganizing sections and bringing code examples up to date with the latest APIs. This effort has been led by stevhliu (thanks a lot! ๐) and yiyixuxu, but many others have chimed in and contributed.
Check it out: https://huggingface.co/docs/diffusers/index
Don't hesitate to open PRs for fixes to the documentation, they are greatly appreciated as discussed in our (revised, of course) [contribution guide](https://huggingface.co/docs/diffusers/main/en/conceptual/contribution).
![Screenshot from 2023-04-12 18-08-35](https://user-images.githubusercontent.com/23423619/231517756-b1ea7f4e-24a1-4d6c-ad03-39db97089eb4.png)
๐ช Stable UnCLIP
Stable UnCLIP is the best open-sourced image variation model out there. Pass an initial image and optionally a prompt to generate variations of the image:
py
from diffusers import DiffusionPipeline
from diffusers.utils import load_image
import torch
pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1-unclip-small", torch_dtype=torch.float16)
pipe.to("cuda")
get image
url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/stable_unclip/tarsila_do_amaral.png"
image = load_image(url)
run image variation
image = pipe(image).images[0]
For more information you can have a look at ["stabilityai/stable-diffusion-2-1-unclip"](https://huggingface.co/stabilityai/stable-diffusion-2-1-unclip)
https://user-images.githubusercontent.com/23423619/231513081-ace66d77-39d4-4064-bb20-2db2ce6b000a.mp4
๐ More ControlNet
ControlNet was released in `diffusers` in version 0.14.0, but we have some exciting developments: Multi-ControlNet, a training script, and upcoming event and a community image-to-image pipeline contributed by mikegarts!
Multi-ControlNet
Thanks to community member takuma104, it's now possible to use several ControlNet conditioning models at once! It works with the same API as before, only supplying a list of ControlNets instead of just once:
Python
import torch
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
controlnet_canny = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny",
torch_dtype=torch.float16).to("cuda")
controlnet_pose = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-openpose",
torch_dtype=torch.float16).to("cuda")
pipe = StableDiffusionControlNetPipeline.from_pretrained(
"example/a-sd15-variant-model", torch_dtype=torch.float16,
controlnet=[controlnet_pose, controlnet_canny]
).to("cuda")
pose_image = ...
canny_image = ...
prompt = ...
image = pipe(prompt=prompt, image=[pose_image, canny_image]).images[0]
And this is an example of how this affects generation:
|Control Image1|Control Image2|Generated|
|---|---|---|
|<img width="200" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/multi_controlnet/pac_pose_512x512.png">|<img width="200" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/multi_controlnet/pac_canny_512x512.png">|<img width="200" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/multi_controlnet/mc_pose_and_canny_result_19.png">|
|<img width="200" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/multi_controlnet/pac_pose_512x512.png">|(none)|<img width="200" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/multi_controlnet/mc_pose_only_result_19.png">|
|<img width="200" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/multi_controlnet/pac_canny_512x512.png">|(none)|<img width="200" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/multi_controlnet/mc_canny_only_result_19.png">|
ControlNet Training
We have created a training script for ControlNet, and can't wait to see what new ideas the community may come up with! In fact, we are so pumped about it that we are organizing a JAX Diffusers sprint with a special focus on ControlNet, where participant teams will be assigned TPUs v4-8 to work on their projects :exploding_head:. Those are some mean machines, so make sure you join our discord to follow the event: https://discord.com/channels/879548962464493619/897387888663232554/1092751149217615902.
๐โโฌ Textual Inversion, Revisited
Several great contributors have been working on textual inversion to get the most of it. isamu-isozaki made it possible to perform multitoken training, and piEsposito & GuiyeC created an easy way to load textual inversion embeddings. These contributors are always a pleasure to work with ๐, we feel honored and proud of this community ๐
Loading textual inversion embeddings is compatible with the Automatic1111 format, so you can download embeddings from other services (such as civitai), and easily apply them in `diffusers`. Please check [the updated documentation](https://huggingface.co/docs/diffusers/main/en/training/text_inversion#inference) for details.
๐ Faster loading of cached pipelines
We conducted a thorough investigation of the pipeline loading process to make it as fast as possible. This is the before and after:
Previous: 2.27 sec
Now: 1.1 sec
Instead of performing 3 HTTP operations, we now get all we need with just one. That single call is necessary to check whether any of the components in the pipeline were updated โ if that's the case, then we need to download the new files. This improvement also applies when you load individual models instead of pre-trained pipelines.
This may not sound as much, but many people use `diffusers` for user-facing services where models and pipelines have to be reused on demand. By minimizing latency, they can provide a better service to their users and minimize operating costs.
This can be further reduced by forcing `diffusers` to just use the items on disk and never check for updates. This is not recommended for most users, but can be interesting in production environments.
๐ฉ Weight prompting using `compel`
Weight prompting is a popular method to increase the importance of some of the elements that appear in a text prompt, as a way to force image generation to obey to those concepts. Because `diffusers` is used in multitude of services and projects, we wanted to provide a very flexible way to adopt prompt weighting, so users can ultimately build the system they prefer. Our apprach was to:
- Make the Stable Diffusion pipelines accept raw prompt embeddings. You are free to create the embeddings however you see fit, so users can come up with new ideas to express weighting in their projects.
- At the same time, we adopted [`compel`](https://github.com/damian0815/compel), by damian0815, as a higher-level library to create the weighted embeddings.
You don't have to use `compel` to create the embeddings, but if you do, this is an example of how it looks in practice:
Python
from diffusers import StableDiffusionPipeline, UniPCMultistepScheduler
from compel import Compel
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
compel_proc = Compel(tokenizer=pipe.tokenizer, text_encoder=pipe.text_encoder)
prompt = "a red cat playing with a ball++"
prompt_embeds = compel_proc(prompt)
image = pipe(prompt_embeds=prompt_embeds, num_inference_steps=20).images[0]
![img](https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/compel/forest_1.png)
As you can see, we assign more weight to the `ball` word using a compel-specific syntax (`ball++`). You can use other libraries (or your own) to create appropriate embeddings to pass to the pipeline.
You can read more details in [the documentation](https://huggingface.co/docs/diffusers/main/en/using-diffusers/weighted_prompts).
๐ฒ Karras Sigmas for schedulers
Some diffusers schedulers now support Karras sigmas! Thanks nipunjindal !
See *Add Karras pattern to discrete euler* in 2956 for more information.
All commits
* Adding support for `safetensors` and LoRa. by Narsil in 2448
* [Post release] Push post release by patrickvonplaten in 2546
* Correct section docs by patrickvonplaten in 2540
* adds `xformers` support to `train_unconditional.py` by vvvm23 in 2520
* Bug Fix: Remove explicit message argument in deprecate by alvanli in 2421
* Update pipeline_stable_diffusion_inpaint_legacy.py resize to integer multiple of 8 instead of 32 for init image and mask by Laveraaa in 2350
* move test num_images_per_prompt to pipeline mixin by williamberman in 2488
* Training tutorial by stevhliu in 2473
* Fix regression introduced in 2448 by Narsil in 2551
* Fix for InstructPix2PixPipeline to allow for prompt embeds to be passed in without prompts. by DN6 in 2456
* [PipelineTesterMixin] Handle non-image outputs for attn slicing test by sanchit-gandhi in 2504
* [Community Pipeline] Unclip Image Interpolation by Abhinay1997 in 2400
* Fix: controlnet docs format by vicoooo26 in 2559
* ema step, don't empty cuda cache by williamberman in 2563
* Add custom vae (diffusers type) to onnx converter by ForserX in 2325
* add OnnxStableDiffusionUpscalePipeline pipeline by ssube in 2158
* Support convert LoRA safetensors into diffusers format by haofanwang in 2403
* [Unet1d] correct docs by patrickvonplaten in 2565
* [Training] Fix tensorboard typo by patrickvonplaten in 2566
* allow Attend-and-excite pipeline work with different image sizes by yiyixuxu in 2476
* Allow textual_inversion_flax script to use save_steps and revision flag by haixinxu in 2075
* add intermediate logging for dreambooth training script by yiyixuxu in 2557
* community controlnet inpainting pipelines by williamberman in 2561
* [docs] Move relevant code for text2image to docs by stevhliu in 2537
* [docs] Move DreamBooth training materials to docs by stevhliu in 2547
* [docs] Move text-to-image LoRA training from blog to docs by stevhliu in 2527
* Update quicktour by stevhliu in 2463
* Support revision in Flax text-to-image training by pcuenca in 2567
* fix the default value of doc by xiaohu2015 in 2539
* Added multitoken training for textual inversion. Issue 369 by isamu-isozaki in 661
* [Docs]Fix invalid link to Pokemons dataset by zxypro1 in 2583
* [Docs] Weight prompting using compel by patrickvonplaten in 2574
* community stablediffusion controlnet img2img pipeline by mikegarts in 2584
* Improve dynamic thresholding and extend to DDPM and DDIM Schedulers by clarencechen in 2528
* [docs] Move Textual Inversion training examples to docs by stevhliu in 2576
* add deps table check updated to ci by williamberman in 2590
* Add notebook doc img2img by yiyixuxu in 2472
* [docs] Build notebooks from Markdown by stevhliu in 2570
* [Docs] Fix link to colab by patrickvonplaten in 2604
* [docs] Update unconditional image generation docs by stevhliu in 2592
* Add OpenVINO documentation by echarlaix in 2569
* Support LoRA for text encoder by haofanwang in 2588
* fix: un-existing tmp config file in linux, avoid unnecessary disk IO by knoopx in 2591
* Fixed incorrect width/height assignment in StableDiffusionDepth2ImgPiโฆ by antoche in 2558
* add flax pipelines to api doc + doc string examples by yiyixuxu in 2600
* Fix typos by standardAI in 2608
* Migrate blog content to docs by stevhliu in 2477
* Add cache_dir to docs by patrickvonplaten in 2624
* Make sure that DEIS, DPM and UniPC can correctly be switched in & out by patrickvonplaten in 2595
* Revert "[docs] Build notebooks from Markdown" by patrickvonplaten in 2625
* Up vesion at which we deprecate "revision='fp16'" since `transformers` is not released yet by patrickvonplaten in 2623
* [Tests] Split scheduler tests by patrickvonplaten in 2630
* Improve ddim scheduler and fix bug when prediction type is "sample" by PeterL1n in 2094
* update paint by example docs by williamberman in 2598
* [From pretrained] Speed-up loading from cache by patrickvonplaten in 2515
* add translated docs by LolitaSian in 2587
* [Dreambooth] Editable number of class images by Mr-Philo in 2251
* Update quicktour.mdx by standardAI in 2637
* Update basic_training.mdx by standardAI in 2639
* controlnet sd 2.1 checkpoint conversions by williamberman in 2593
* [docs] Update readme by stevhliu in 2612
* [Pipeline loading] Remove send_telemetry by patrickvonplaten in 2640
* [docs] Build Jax notebooks for real by stevhliu in 2641
* Update loading.mdx by standardAI in 2642
* Support non square image generation for StableDiffusionSAGPipeline by AkiSakurai in 2629
* Update schedulers.mdx by standardAI in 2647
* [attention] Fix attention by patrickvonplaten in 2656
* Add support for Multi-ControlNet to StableDiffusionControlNetPipeline by takuma104 in 2627
* [Tests] Adds a test suite for `EMAModel` by sayakpaul in 2530
* fix the in-place modification in unet condition when using controlnet by andrehuang in 2586
* image generation main process checks by williamberman in 2631
* [Hub] Upgrade to 0.13.2 by patrickvonplaten in 2670
* AutoencoderKL: clamp indices of blend_h and blend_v to input size by kig in 2660
* Update README.md by qwjaskzxl in 2653
* [Lora] correct lora saving & loading by patrickvonplaten in 2655
* Add ddim noise comparative analysis pipeline by aengusng8 in 2665
* Add support for different model prediction types in DDIMInverseScheduler by clarencechen in 2619
* controlnet integration tests num_inference_steps=3 by williamberman in 2672
* Controlnet training by Ttl in 2545
* [Docs] Adds a documentation page for evaluating diffusion models by sayakpaul in 2516
* [Tests] fix: slow serialization test by sayakpaul in 2678
* Update Dockerfile CUDA by patrickvonplaten in 2682
* T5Attention support for cross-attention by kashif in 2654
* Update custom_pipeline_overview.mdx by standardAI in 2684
* Update kerascv.mdx by standardAI in 2685
* Update img2img.mdx by standardAI in 2688
* Update conditional_image_generation.mdx by standardAI in 2687
* Update controlling_generation.mdx by standardAI in 2690
* Update unconditional_image_generation.mdx by standardAI in 2686
* Add image_processor by yiyixuxu in 2617
* [docs] Add overviews to each section by stevhliu in 2657
* [docs] Create better navigation on index by stevhliu in 2658
* [docs] Reorganize table of contents by stevhliu in 2671
* Rename attention by patrickvonplaten in 2691
* Adding `use_safetensors` argument to give more control to users by Narsil in 2123
* [docs] Add safety checker to ethical guidelines by stevhliu in 2699
* train_unconditional save restore unet parameters by williamberman in 2706
* Improve deprecation error message when using cross_attention import by patrickvonplaten in 2710
* fix image link in inpaint doc by yiyixuxu in 2693
* [docs] Update ONNX doc to use `optimum` by sayakpaul in 2702
* Enabling gradient checkpointing for VAE by Pie31415 in 2536
* [Tests] Correct PT2 by patrickvonplaten in 2724
* Update mps.mdx by standardAI in 2749
* Update torch2.0.mdx by standardAI in 2748
* Update fp16.mdx by standardAI in 2746
* Update dreambooth.mdx by standardAI in 2742
* Update philosophy.mdx by standardAI in 2752
* Update text_inversion.mdx by standardAI in 2751
* add: controlnet entry to training section in the docs. by sayakpaul in 2677
* Update numbers for Habana Gaudi in documentation by regisss in 2734
* Improve Contribution Doc by patrickvonplaten in 2043
* Fix typos by apivovarov in 2715
* [1929]: Add CLIP guidance for Img2Img stable diffusion pipeline by nipunjindal in 2723
* Add guidance start/end parameters to StableDiffusionControlNetImg2ImgPipeline by hyowon-ha in 2731
* Fix mps tests on torch 2.0 by pcuenca in 2766
* Add option to set dtype in pipeline.to() method by 1lint in 2317
* stable diffusion depth batching fix by williamberman in 2757
* [docs] update torch 2 benchmark by pcuenca in 2764
* [docs] Clarify purpose of reproducibility docs by stevhliu in 2756
* [MS Text To Video] Add first text to video by patrickvonplaten in 2738
* `mps`: remove warmup passes by pcuenca in 2771
* Support for Offset Noise in examples by haofanwang in 2753
* add: section on multiple controlnets. by sayakpaul in 2762
* [Examples] InstructPix2Pix instruct training script by sayakpaul in 2478
* deduplicate training section in the docs. by sayakpaul in 2788
* [UNet3DModel] Fix with attn processor by patrickvonplaten in 2790
* [doc wip] literalinclude by mishig25 in 2718
* Rename 'CLIPFeatureExtractor' class to 'CLIPImageProcessor' by ainoya in 2732
* Music Spectrogram diffusion pipeline by kashif in 1044
* [2737]: Add DPMSolverMultistepScheduler to CLIP guided community pipeline by nipunjindal in 2779
* [Docs] small fixes to the text to video doc. by sayakpaul in 2787
* Update train_text_to_image_lora.py by haofanwang in 2767
* Skip `mps` in text-to-video tests by pcuenca in 2792
* Flax controlnet by yiyixuxu in 2727
* [docs] Add Colab notebooks and Spaces by stevhliu in 2713
* Add AudioLDM by sanchit-gandhi in 2232
* Update train_text_to_image_lora.py by haofanwang in 2795
* Add ModelEditing pipeline by bahjat-kawar in 2721
* Relax DiT test by kashif in 2808
* Update onnxruntime package candidates by PeixuanZuo in 2666
* [Stable UnCLIP] Finish Stable UnCLIP by patrickvonplaten in 2814
* [Docs] update docs (Stable unCLIP) to reflect the updated ckpts. by sayakpaul in 2815
* StableDiffusionModelEditingPipeline documentation by bahjat-kawar in 2810
* Update `examples` README.md to include the latest examples by sayakpaul in 2839
* Ruff: apply same rules as in transformers by pcuenca in 2827
* [Tests] Fix slow tests by patrickvonplaten in 2846
* Fix StableUnCLIPImg2ImgPipeline handling of explicitly passed image embeddings by unishift in 2845
* Helper function to disable custom attention processors by pcuenca in 2791
* improve stable unclip doc. by sayakpaul in 2823
* add: better warning messages when handling multiple conditionings. by sayakpaul in 2804
* [WIP]Flax training script for controlnet by yiyixuxu in 2818
* Make dynamo wrapped modules work with save_pretrained by pcuenca in 2726
* [Init] Make sure shape mismatches are caught early by patrickvonplaten in 2847
* updated onnx pndm test by kashif in 2811
* [Stable Diffusion] Allow users to disable Safety checker if loading model from checkpoint by Stax124 in 2768
* fix KarrasVePipeline bug by junhsss in 2828
* StableDiffusionLongPromptWeightingPipeline: Do not hardcode pad token by AkiSakurai in 2832
* Remove suggestion to use cuDNN benchmark in docs by d1g1t in 2793
* Remove duplicate sentence in docstrings by qqaatw in 2834
* Update the legacy inpainting SD pipeline, to allow calling it with only prompt_embeds (instead of always requiring a prompt) by cmdr2 in 2842
* Fix link to LoRA training guide in DreamBooth training guide by ushuz in 2836
* [WIP][Docs] Use DiffusionPipeline Instead of Child Classes when Loading Pipeline by dg845 in 2809
* Add `last_epoch` argument to `optimization.get_scheduler` by felixblanke in 2850
* [WIP] Check UNet shapes in StableDiffusionInpaintPipeline __init__ by dg845 in 2853
* [2761]: Add documentation for extra_in_channels UNet1DModel by nipunjindal in 2817
* [Tests] Adds a test to check if `image_embeds` None case is handled properly in `StableUnCLIPImg2ImgPipeline` by sayakpaul in 2861
* Update evaluation.mdx by standardAI in 2862
* Update overview.mdx by standardAI in 2864
* Update alt_diffusion.mdx by standardAI in 2865
* Update paint_by_example.mdx by standardAI in 2869
* Update stable_diffusion_safe.mdx by standardAI in 2870
* [Docs] Correct phrasing by patrickvonplaten in 2873
* [Examples] Add streaming support to the ControlNet training example in JAX by sayakpaul in 2859
* feat: allow offset_noise in dreambooth training example by yamanahlawat in 2826
* [docs] Performance tutorial by stevhliu in 2773
* [Docs] add an example use for `StableUnCLIPPipeline` in the pipeline docs by sayakpaul in 2897
* add flax requirement by yiyixuxu in 2894
* Support fp16 in conversion from original ckpt by burgalon in 2733
* img2img.multiple.controlnets.pipeline by mikegarts in 2833
* add load textual inversion embeddings to stable diffusion by piEsposito in 2009
* [docs] add the Stable diffusion with Jax/Flax Guide into the docs by yiyixuxu in 2487
* Add support `Karras sigmas` for StableDiffusionKDiffusionPipeline by takuma104 in 2874
* Fix textual inversion loading by GuiyeC in 2914
* Fix slow tests text inv by patrickvonplaten in 2915
* Fix check_inputs in upscaler pipeline to allow embeds by d1g1t in 2892
* Modify example with intel optimization by mengfei25 in 2896
* [2884]: Fix cross_attention_kwargs in StableDiffusionImg2ImgPipeline by nipunjindal in 2902
* [Tests] Speed up test by patrickvonplaten in 2919
* Have fix current pipeline link by guspan-tanadi in 2910
* Update image_variation.mdx by standardAI in 2911
* Update controlnet.mdx by standardAI in 2912
* Update pipeline_stable_diffusion_controlnet.py by patrickvonplaten in 2917
* Check for all different packages of opencv by wfng92 in 2901
* fix: norm group test for UNet3D. by sayakpaul in 2959
* Update euler_ancestral.mdx by standardAI in 2932
* Update unipc.mdx by standardAI in 2936
* Update score_sde_ve.mdx by standardAI in 2937
* Update score_sde_vp.mdx by standardAI in 2938
* Update ddim.mdx by standardAI in 2926
* Update ddpm.mdx by standardAI in 2929
* Removing explicit markdown extension by guspan-tanadi in 2944
* Ensure validation image RGB not RGBA by ernestchu in 2945
* Use `upload_folder` in training scripts by Wauplin in 2934
* allow use custom local dataset for controlnet training scripts by yiyixuxu in 2928
* fix post-processing by yiyixuxu in 2968
* [docs] Simplify loading guide by stevhliu in 2694
* update flax controlnet training script by yiyixuxu in 2951
* [Pipeline download] Improve pipeline download for index and passed coโฆ by patrickvonplaten in 2980
* The variable name has been updated. by kadirnar in 2970
* Update the K-Diffusion SD pipeline, to allow calling it with only prompt_embeds (instead of always requiring a prompt) by cmdr2 in 2962
* [Examples] Add support for Min-SNR weighting strategy for better convergence by sayakpaul in 2899
* [scheduler] fix some scheduler dtype error by furry-potato-maker in 2992
* minor fix in controlnet flax example by yiyixuxu in 2986
* Explain how to install test dependencies by pcuenca in 2983
* docs: Link Navigation Path API Pipelines by guspan-tanadi in 2976
* add Min-SNR loss to Controlnet flax train script by yiyixuxu in 3016
* dynamic threshold sampling bug fixes and docs by williamberman in 3003
* Initial draft of Core ML docs by pcuenca in 2987
* [Pipeline] Add TextToVideoZeroPipeline by 19and99 in 2954
* Small typo correction in comments by rogerioagjr in 3012
* mps: skip unstable test by pcuenca in 3037
* Update contribution.mdx by mishig25 in 3054
* fix report tool by patrickvonplaten in 3047
* Fix config prints and save, load of pipelines by patrickvonplaten in 2849
* [docs] Reusing components by stevhliu in 3000
* Fix imports for composable_stable_diffusion pipeline by nthh in 3002
* config fixes by williamberman in 3060
* accelerate min version for ProjectConfiguration import by williamberman in 3042
* `AttentionProcessor.group_norm` num_channels should be `query_dim` by williamberman in 3046
* Update documentation by George-Ogden in 2996
* Fix scheduler type mismatch by pcuenca in 3041
* Fix invocation of some slow Flax tests by pcuenca in 3058
* add only cross attention to simple attention blocks by williamberman in 3011
* Fix typo and format BasicTransformerBlock attributes by off99555 in 2953
* unet time embedding activation function by williamberman in 3048
* Attention processor cross attention norm group norm by williamberman in 3021
* Attn added kv processor torch 2.0 block by williamberman in 3023
* [Examples] Fix type-casting issue in the ControlNet training script by sayakpaul in 2994
* [LoRA] Enabling limited LoRA support for text encoder by sayakpaul in 2918
* fix slow tsets by patrickvonplaten in 3066
* Fix InstructPix2Pix training in multi-GPU mode by sayakpaul in 2978
* [Docs] update Self-Attention Guidance docs by SusungHong in 2952
* Flax memory efficient attention by pcuenca in 2889
* [WIP] implement rest of the test cases (LoRA tests) by Pie31415 in 2824
* fix pipeline __setattr__ value == None by williamberman in 3063
* add support for pre-calculated prompt embeds to Stable Diffusion ONNX pipelines by ssube in 2597
* [2064]: Add Karras to DPMSolverMultistepScheduler by nipunjindal in 3001
* Finish docs textual inversion by patrickvonplaten in 3068
* [Docs] refactor text-to-video zero by sayakpaul in 3049
* Update Flax TPU tests by pcuenca in 3069
* Fix a bug of pano when not doing CFG by ernestchu in 3030
* Text2video zero refinements by 19and99 in 3070
Significant community contributions
The following contributors have made significant changes to the library over the last release:
* Abhinay1997
* [Community Pipeline] Unclip Image Interpolation (2400)
* ssube
* add OnnxStableDiffusionUpscalePipeline pipeline (2158)
* add support for pre-calculated prompt embeds to Stable Diffusion ONNX pipelines (2597)
* haofanwang
* Support convert LoRA safetensors into diffusers format (2403)
* Support LoRA for text encoder (2588)
* Support for Offset Noise in examples (2753)
* Update train_text_to_image_lora.py (2767)
* Update train_text_to_image_lora.py (2795)
* isamu-isozaki
* Added multitoken training for textual inversion. Issue 369 (661)
* mikegarts
* community stablediffusion controlnet img2img pipeline (2584)
* img2img.multiple.controlnets.pipeline (2833)
* LolitaSian
* add translated docs (2587)
* Ttl
* Controlnet training (2545)
* nipunjindal
* [1929]: Add CLIP guidance for Img2Img stable diffusion pipeline (2723)
* [2737]: Add DPMSolverMultistepScheduler to CLIP guided community pipeline (2779)
* [2761]: Add documentation for extra_in_channels UNet1DModel (2817)
* [2884]: Fix cross_attention_kwargs in StableDiffusionImg2ImgPipeline (2902)
* [2905]: Add Karras pattern to discrete euler (2956)
* [2064]: Add Karras to DPMSolverMultistepScheduler (3001)
* bahjat-kawar
* Add ModelEditing pipeline (2721)
* StableDiffusionModelEditingPipeline documentation (2810)
* piEsposito
* add load textual inversion embeddings to stable diffusion (2009)
* 19and99
* [Pipeline] Add TextToVideoZeroPipeline (2954)
* Text2video zero refinements (3070)
* MuhHanif
* Flax memory efficient attention (2889)