phrases = ["a waterfall", "a modern high speed train running through the tunnel"]
images = pipe(
prompt=prompt,
gligen_phrases=phrases,
gligen_boxes=boxes,
gligen_scheduled_sampling_beta=1,
output_type="pil",
num_inference_steps=50,
).images
images[0].save("./gligen-1-4-generation-text-box.jpg")
Refer to the [documentation](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/gligen) to learn more.
Thanks to nikhil-masterful for contributing GLIGEN in 4441.
Tiny Autoencoder
madebyollin trained two Autoencoders (on [Stable Diffusion](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/overview) and [Stable Diffusion XL](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/stable_diffusion_xl), respectively) to dramatically cut down the image decoding time. The effects are especially pronounced when working with larger-resolution images. You can use `AutoencoderTiny` to take advantage of it.
Here’s the example usage for Stable Diffusion:
python
import torch
from diffusers import DiffusionPipeline, AutoencoderTiny
pipe = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-2-1-base", torch_dtype=torch.float16
)
pipe.vae = AutoencoderTiny.from_pretrained("madebyollin/taesd", torch_dtype=torch.float16)
pipe = pipe.to("cuda")
prompt = "slice of delicious New York-style berry cheesecake"
image = pipe(prompt, num_inference_steps=25).images[0]
image.save("cheesecake.png")
Refer to the [documentation](https://huggingface.co/docs/diffusers/main/en/api/models/autoencoder_tiny) to learn more. Refer to [this material](https://gist.github.com/sayakpaul/a57a86ee7419ac3e7a7879fd100e8d06) to understand the implications of using this Autoencoder in terms of inference latency and memory footprint.
Fine-tuning Stable Diffusion XL with DreamBooth and LoRA on a free-tier Colab Notebook
Stable Diffusion XL’s (SDXL) high memory requirements often seem restrictive when it comes to using it for downstream applications. Even if one uses parameter-efficient fine-tuning techniques like [LoRA](https://huggingface.co/docs/diffusers/main/en/training/lora), fine-tuning just the UNet component of SDXL can be quite memory-intensive. So, running it on a free-tier Colab Notebook (that usually has a 16 GB T4 GPU attached) seems impossible.
Now, with better support for gradient checkpointing and other recipes like 8 Bit Adam (via `bitsandbytes`), it is possible to fine-tune the UNet of SDXL with DreamBooth and LoRA on a free-tier Colab Notebook.
Check out the [Colab Notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/SDXL_DreamBooth_LoRA_.ipynb) to learn more.
Thanks to ethansmith2000 for improving the gradient checkpointing support in 4474.
Support of `push_to_hub` for models, schedulers, and pipelines
Our models, schedulers, and pipelines now support an option of `push_to_hub` via the `save_pretrained()` and also come with a `push_to_hub()` method. Below are some examples of usage.
**Models**
python
from diffusers import ControlNetModel
controlnet = ControlNetModel(
block_out_channels=(32, 64),
layers_per_block=2,
in_channels=4,
down_block_types=("DownBlock2D", "CrossAttnDownBlock2D"),
cross_attention_dim=32,
conditioning_embedding_out_channels=(16, 32),
)
controlnet.push_to_hub("my-controlnet-model")
or controlnet.save_pretrained("my-controlnet-model", push_to_hub=True)
**Schedulers**
python
from diffusers import DDIMScheduler
scheduler = DDIMScheduler(
beta_start=0.00085,
beta_end=0.012,
beta_schedule="scaled_linear",
clip_sample=False,
set_alpha_to_one=False,
)
scheduler.push_to_hub("my-controlnet-scheduler")
**Pipelines**
python
from diffusers import (
UNet2DConditionModel,
AutoencoderKL,
DDIMScheduler,
StableDiffusionPipeline,
)
from transformers import CLIPTextModel, CLIPTextConfig, CLIPTokenizer
unet = UNet2DConditionModel(
block_out_channels=(32, 64),
layers_per_block=2,
sample_size=32,
in_channels=4,
out_channels=4,
down_block_types=("DownBlock2D", "CrossAttnDownBlock2D"),
up_block_types=("CrossAttnUpBlock2D", "UpBlock2D"),
cross_attention_dim=32,
)
scheduler = DDIMScheduler(
beta_start=0.00085,
beta_end=0.012,
beta_schedule="scaled_linear",
clip_sample=False,
set_alpha_to_one=False,
)
vae = AutoencoderKL(
block_out_channels=[32, 64],
in_channels=3,
out_channels=3,
down_block_types=["DownEncoderBlock2D", "DownEncoderBlock2D"],
up_block_types=["UpDecoderBlock2D", "UpDecoderBlock2D"],
latent_channels=4,
)
text_encoder_config = CLIPTextConfig(
bos_token_id=0,
eos_token_id=2,
hidden_size=32,
intermediate_size=37,
layer_norm_eps=1e-05,
num_attention_heads=4,
num_hidden_layers=5,
pad_token_id=1,
vocab_size=1000,
)
text_encoder = CLIPTextModel(text_encoder_config)
tokenizer = CLIPTokenizer.from_pretrained("hf-internal-testing/tiny-random-clip")
components = {
"unet": unet,
"scheduler": scheduler,
"vae": vae,
"text_encoder": text_encoder,
"tokenizer": tokenizer,
"safety_checker": None,
"feature_extractor": None,
}
pipeline = StableDiffusionPipeline(**components)
pipeline.push_to_hub("my-pipeline")
Refer to the [documentation](https://huggingface.co/docs/diffusers/main/en/using-diffusers/push_to_hub) to know more.
Thanks to Wauplin for his generous and constructive feedback (refer to this 4218) on this feature.
Better support for loading Kohya-trained LoRA checkpoints
Providing seamless support for loading Kohya-trained LoRA checkpoints from `diffusers` is important for us. This is why we continue to improve our `load_lora_weights()` method. Check out the [documentation](https://huggingface.co/docs/diffusers/main/en/training/lora#supporting-a1111-themed-lora-checkpoints-from-diffusers) to know more about what’s currently supported and the current limitations.
Thanks to isidentical for extending their help in improving this support.
Better documentation for prompt weighting
Prompt weighting provides a way to emphasize or de-emphasize certain parts of a prompt, allowing for more control over the generated image. `compel` provides an easy way to do prompt weighting compatible with `diffusers`. To this end, we have worked on an improved guide. Check it out [here](https://huggingface.co/docs/diffusers/main/en/using-diffusers/weighted_prompts).
Defaulting to serialize with `.safetensors`
Starting with this release, we will default to using `.safetensors` as our preferred serialization method. This change is reflected in all the training examples that we officially support.
All commits
* 0.20.0dev0 by patrickvonplaten in 4299
* update Kandinsky doc by yiyixuxu in 4301
* [Torch.compile] Fixes torch compile graph break by patrickvonplaten in 4315
* Fix SDXL conversion from original to diffusers by duongna21 in 4280
* fix a bug in StableDiffusionUpscalePipeline when `prompt` is `None` by yiyixuxu in 4278
* [Local loading] Correct bug with local files only by patrickvonplaten in 4318
* Fix typo documentation by echarlaix in 4320
* fix validation option for dreambooth training example by xinyangli in 4317
* [Tests] add test for pipeline import. by sayakpaul in 4276
* Honor the SDXL 1.0 licensing from the training scripts. by sayakpaul in 4319
* Update README_sdxl.md to correct the header by sayakpaul in 4330
* [SDXL Refiner] Fix refiner forward pass for batched input by patrickvonplaten in 4327
* correct doc string for default value of guidance_scale by Tanupriya-Singh in 4339
* [ONNX] Don't download ONNX model by default by patrickvonplaten in 4338
* Fix repeat of negative prompt by kathath in 4335
* [SDXL] Make watermarker optional under certain circumstances to improve usability of SDXL 1.0 by patrickvonplaten in 4346
* [Feat] Support SDXL Kohya-style LoRA by sayakpaul in 4287
* fix fp type in t2i adapter docs by williamberman in 4350
* Update README.md to have PyPI-friendly path by sayakpaul in 4351
* [SDXL-IP2P] Add gif for demonstrating training processes by harutatsuakiyama in 4342
* [SDXL] Fix dummy imports incorrect naming by patrickvonplaten in 4370
* Clean up duplicate lines in encode_prompt by avoroshilov in 4369
* minor doc fixes. by sayakpaul in 4380
* Update docs of unet_1d.py by nishant42491 in 4394
* [AutoPipeline] Correct naming by patrickvonplaten in 4420
* [ldm3d] documentation fixing typos by estelleafl in 4284
* Cleanup pass for flaky Slow Tests for Stable diffusion by DN6 in 4415
* support from_single_file for SDXL inpainting by yiyixuxu in 4408
* fix test_float16_inference by yiyixuxu in 4412
* train dreambooth fix pre encode class prompt by williamberman in 4395
* [docs] Fix SDXL docstring by stevhliu in 4397
* Update documentation by echarlaix in 4422
* remove mentions of textual inversion from sdxl. by sayakpaul in 4404
* [LoRA] Fix SDXL text encoder LoRAs by sayakpaul in 4371
* [docs] AutoPipeline tutorial by stevhliu in 4273
* [Pipelines] Add community pipeline for Zero123 by kxhit in 4295
* [Feat] add tiny Autoencoder for (almost) instant decoding by sayakpaul in 4384
* can call encode_prompt with out setting a text encoder instance variable by williamberman in 4396
* Accept pooled_prompt_embeds in the SDXL Controlnet pipeline. Fixes an error if prompt_embeds are passed. by cmdr2 in 4309
* Prevent online access when desired when using download_from_original_stable_diffusion_ckpt by w4ffl35 in 4271
* move tests to nightly by DN6 in 4451
* auto type conversion by isNeil in 4270
* Fix typerror in pipeline handling for MultiControlNets which only contain a single ControlNet by Georgehe4 in 4454
* Add rank argument to train_dreambooth_lora_sdxl.py by levi in 4343
* [docs] Distilled SD by stevhliu in 4442
* Allow controlnets to be loaded (from ckpt) in a parallel thread with a SD model (ckpt), and speed it up slightly by cmdr2 in 4298
* fix typo to ensure `make test-examples` work correctly by statelesshz in 4329
* Fix bug caused by typo by HeliosZhao in 4357
* Delete the duplicate code for the contolnet img 2 img by VV-A-VV in 4411
* Support different strength for Stable Diffusion TensorRT Inpainting pipeline by jinwonkim93 in 4216
* add sdxl to prompt weighting by patrickvonplaten in 4439
* a few fix for kandinsky combined pipeline by yiyixuxu in 4352
* fix-format by yiyixuxu in 4458
* Cleanup Pass on flaky slow tests for Stable Diffusion by DN6 in 4455
* Fixed multi-token textual inversion training by manosplitsis in 4452
* TensorRT Inpaint pipeline: minor fixes by asfiyab-nvidia in 4457
* [Tests] Adds integration tests for SDXL LoRAs by sayakpaul in 4462
* Update README_sdxl.md by patrickvonplaten in 4472
* [SDXL] Allow SDXL LoRA to be run with less than 16GB of VRAM by patrickvonplaten in 4470
* Add a data_dir parameter to the load_dataset method. by AisingioroHao0 in 4482
* [Examples] Support train_text_to_image_lora_sdxl.py by okotaku in 4365
* Log global_step instead of epoch to tensorboard by mrlzla in 4493
* Update lora.md to clarify SDXL support by sayakpaul in 4503
* [SDXL LoRA] fix batch size lora by patrickvonplaten in 4509
* Make sure fp16-fix is used as default by patrickvonplaten in 4510
* grad checkpointing by ethansmith2000 in 4474
* move pipeline only when running validation by patrickvonplaten in 4515
* Moving certain pipelines slow tests to nightly by DN6 in 4469
* add pipeline_class_name argument to Stable Diffusion conversion script by yiyixuxu in 4461
* Fix misc typos by Georgehe4 in 4479
* fix indexing issue in sd reference pipeline by DN6 in 4531
* Copy lora functions to XLPipelines by wooyeolBaek in 4512
* introduce minimalistic reimplementation of SDXL on the SDXL doc by cloneofsimo in 4532
* Fix push_to_hub in train_text_to_image_lora_sdxl.py example by ra100 in 4535
* Update README_sdxl.md to include the free-tier Colab Notebook by sayakpaul in 4540
* Changed code that converts tensors to PIL images in the write_your_own_pipeline notebook by jere357 in 4489
* Move slow tests to nightly by DN6 in 4526
* pin ruff version for quality checks by DN6 in 4539
* [docs] Clean scheduler api by stevhliu in 4204
* Move controlnet load local tests to nightly by DN6 in 4543
* Revert "introduce minimalistic reimplementation of SDXL on the SDXL doc" by patrickvonplaten in 4548
* fix some typo error by VV-A-VV in 4546
* improve controlnet sdxl docs now that we have a good checkpoint. by sayakpaul in 4556
* [Doc] update sdxl-controlnet repo name by yiyixuxu in 4564
* [docs] Expand prompt weighting by stevhliu in 4516
* [docs] Remove attention slicing by stevhliu in 4518
* [docs] Add safetensors flag by stevhliu in 4245
* Convert Stable Diffusion ControlNet to TensorRT by dotieuthien in 4465
* Remove code snippets containing `is_safetensors_available()` by chiral-carbon in 4521
* Fixing repo_id regex validation error on windows platforms by Mystfit in 4358
* [Examples] fix: network_alpha -> network_alphas by sayakpaul in 4572
* [docs] Fix ControlNet SDXL docstring by stevhliu in 4582
* [Utility] adds an image grid utility by sayakpaul in 4576
* Fixed invalid pipeline_class_name parameter. by AisingioroHao0 in 4590
* Fix git-lfs command typo in docs by clairefro in 4586
* [Examples] Update InstructPix2Pix README_sdxl.md to fix mentions by sayakpaul in 4574
* [Pipeline utils] feat: implement push_to_hub for standalone models, schedulers as well as pipelines by sayakpaul in 4128
* An invalid clerical error in sdxl finetune by XDUWQ in 4608
* [Docs] fix links in the controlling generation doc. by sayakpaul in 4612
* add: pushtohubmixin to pipelines and schedulers docs overview. by sayakpaul in 4607
* add: train to text image with sdxl script. by sayakpaul in 4505
* Add GLIGEN implementation by nikhil-masterful in 4441
* Update text2image.md to fix the links by sayakpaul in 4626
* Fix unipc use_karras_sigmas exception - fixes huggingface/diffusers4580 by reimager in 4581
* [research_projects] SDXL controlnet script by patil-suraj in 4633
* [Core] feat: MultiControlNet support for SDXL ControlNet pipeline by sayakpaul in 4597
* [docs] PushToHubMixin by stevhliu in 4622
* [docs] MultiControlNet by stevhliu in 4635
* fix loading custom text encoder when using `from_single_file` by DN6 in 4571
* make things clear in the controlnet sdxl doc. by sayakpaul in 4644
* Fix `UnboundLocalError` during LoRA loading by slessans in 4523
* Support higher dimension LoRAs by isidentical in 4625
* [Safetensors] Make safetensors the default way of saving weights by patrickvonplaten in 4235
Significant community contributions
The following contributors have made significant changes to the library over the last release:
* kxhit
* [Pipelines] Add community pipeline for Zero123 (4295)
* okotaku
* [Examples] Support train_text_to_image_lora_sdxl.py (4365)
* dotieuthien
* Convert Stable Diffusion ControlNet to TensorRT (4465)
* nikhil-masterful
* Add GLIGEN implementation (4441)