Würstchen
![](https://github.com/dome272/Wuerstchen/assets/61938694/0617c863-165a-43ee-9303-2a17299a0cf9)
[Würstchen](https://huggingface.co/papers/2306.00637) is a diffusion model, whose text-conditional model works in a highly compressed latent space of images, allowing cheaper and faster inference.
![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/wuerstchen/inference_speed_v2_light.jpg)
Here is how to use the Würstchen as a pipeline:
python
import torch
from diffusers import AutoPipelineForText2Image
from diffusers.pipelines.wuerstchen import DEFAULT_STAGE_C_TIMESTEPS
pipeline = AutoPipelineForText2Image.from_pretrained("warp-ai/wuerstchen", torch_dtype=torch.float16).to("cuda")
caption = "Anthropomorphic cat dressed as a firefighter"
images = pipeline(
caption,
height=1024,
width=1536,
prior_timesteps=DEFAULT_STAGE_C_TIMESTEPS,
prior_guidance_scale=4.0,
num_images_per_prompt=4,
).images
![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/wuertschen/Anthropomorphic_cat_dressed_as_a_fire_fighter.jpg)
To learn more about the pipeline, check out the [official documentation](https://huggingface.co/docs/diffusers/main/en/api/pipelines/wuerstchen).
This pipeline was contributed by one of the authors of Würstchen, dome272, with help from kashif and patrickvonplaten.
👉 Try out the model here: https://huggingface.co/spaces/warp-ai/Wuerstchen
T2I Adapters for Stable Diffusion XL (SDXL)
[T2I-Adapter](https://huggingface.co/papers/2302.08453) is an efficient plug-and-play model that provides extra guidance to pre-trained text-to-image models while freezing the original large text-to-image models.
In collaboration with the Tencent ARC researchers, we trained T2I Adapters on various conditions: sketch, canny, lineart, depth, and openpose.
Below is an how to use the `StableDiffusionXLAdapterPipeline`.
First ensure, the `controlnet_aux` is installed:
bash
pip install -U controlnet_aux==0.0.7
Then we can initialize the pipeline:
python
import torch
from controlnet_aux.lineart import LineartDetector
from diffusers import (AutoencoderKL, EulerAncestralDiscreteScheduler,
StableDiffusionXLAdapterPipeline, T2IAdapter)
from diffusers.utils import load_image, make_image_grid
load adapter
adapter = T2IAdapter.from_pretrained(
"TencentARC/t2i-adapter-lineart-sdxl-1.0", torch_dtype=torch.float16, varient="fp16"
).to("cuda")
load pipeline
model_id = "stabilityai/stable-diffusion-xl-base-1.0"
euler_a = EulerAncestralDiscreteScheduler.from_pretrained(
model_id, subfolder="scheduler"
)
vae = AutoencoderKL.from_pretrained(
"madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16
)
pipe = StableDiffusionXLAdapterPipeline.from_pretrained(
model_id,
vae=vae,
adapter=adapter,
scheduler=euler_a,
torch_dtype=torch.float16,
variant="fp16",
).to("cuda")
load lineart detector
line_detector = LineartDetector.from_pretrained("lllyasviel/Annotators").to("cuda")
We then load an image to compute the lineart conditionings:
python
url = "https://huggingface.co/Adapter/t2iadapter/resolve/main/figs_SDXLV1.0/org_lin.jpg"
image = load_image(url)
image = line_detector(image, detect_resolution=384, image_resolution=1024)
Then we generate:
python
prompt = "Ice dragon roar, 4k photo"
negative_prompt = "anime, cartoon, graphic, text, painting, crayon, graphite, abstract, glitch, deformed, mutated, ugly, disfigured"
gen_images = pipe(
prompt=prompt,
negative_prompt=negative_prompt,
image=image,
num_inference_steps=30,
adapter_conditioning_scale=0.8,
guidance_scale=7.5,
).images[0]
![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/t2i-adapters-sdxl/lineart_generated_dragon.png)
Refer to the [official documentation](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/adapter) to learn more about `StableDiffusionXLAdapterPipeline`.
[This blog post](https://huggingface.co/blog/t2i-sdxl-adapters) summarizes our experiences and provides all the resources (including the pre-trained T2I Adapter checkpoints) to get started using T2I Adapters for SDXL.
We’re also releasing a training script for training your custom T2I Adapters on SDXL. Check out the [documentation](https://huggingface.co/docs/diffusers/main/en/training/t2i_adapters) to learn more.
Thanks to MC-E (one of the authors of T2I Adapters) for contributing the `StableDiffusionXLAdapterPipeline` in 4696.
Faster imports
We introduced “lazy imports” (4829) to significantly improve the time it takes to import our modules (such as `pipelines`, `models`, and so on). Below is a comparison of the timings with and without lazy imports on `import diffusers`.
**With lazy imports**:
bash
real 0m0.417s
user 0m0.714s
sys 0m0.499s
**Without lazy imports**:
bash
real 0m5.391s
user 0m5.299s
sys 0m1.273s
Faster LoRA loading
Previously, loading LoRA parameters using the `load_lora_weights()` used to be time-consuming as reported in 4975. To this end, we introduced a `low_cpu_mem_usage` argument to the `load_lora_weights()` method in 4994 which should speed up the loading time significantly. Just pass `low_cpu_mem_usage=True` to take the benefits.
LoRA fusing
LoRA weights can now be fused into the model weights, thus allowing models that have loaded LoRA weights to run as fast as models without. It also enables to fuse multiple LoRAs into the same model.
For more information, have a look at [the documentation](https://huggingface.co/docs/diffusers/main/en/training/lora#fusing-lora-parameters) and the original PR: https://github.com/huggingface/diffusers/pull/4473.
More support for LoRAs
Almost all LoRA formats out there for SDXL are now supported. For a more details, please check [the documentation](https://huggingface.co/docs/diffusers/main/en/training/lora#supporting-different-lora-checkpoints-from-diffusers).
All commits
* fix: lora sdxl tests by sayakpaul in 4652
* Support tiled encode/decode for `AutoencoderTiny` by Isotr0py in 4627
* Add SDXL long weighted prompt pipeline (replace pr:4629) by xhinker in 4661
* add config_file to from_single_file by zuojianghua in 4614
* Add AudioLDM 2 by sanchit-gandhi in 4549
* [docs] Add note in UniDiffusers Doc about PyTorch 1.X numerical stability issue by dg845 in 4703
* [Core] enable lora for sdxl controlnets too and add slow tests. by sayakpaul in 4666
* [LoRA] ensure different LoRA ranks for text encoders can be properly handled by sayakpaul in 4669
* [LoRA] default to None when fc alphas are not available. by sayakpaul in 4706
* Replaces `DIFFUSERS_TEST_DEVICE` backend list with trying device by vvvm23 in 4673
* add convert diffuser pipeline of XL to original stable diffusion by realliujiaxu in 4596
* Add reference_attn & reference_adain support for sdxl by zideliu in 4502
* [Docs] Fix docs controlnet missing /Tip by patrickvonplaten in 4717
* rename test file to run, so that examples tests do not fail by patrickvonplaten in 4715
* Revert "Move controlnet load local tests to nightly by patrickvonplaten in 4543)"
* Fix all docs by patrickvonplaten in 4721
* fix bad error message when transformers is missing by patrickvonplaten in 4714
* Fix AutoencoderTiny encoder scaling convention by madebyollin in 4682
* [Examples] fix checkpointing and casting bugs in `train_text_to_image_lora_sdxl.py` by sayakpaul in 4632
* [AudioLDM Docs] Fix docs for output by sanchit-gandhi in 4737
* [docs] add variant="fp16" flag by realliujiaxu in 4678
* [AudioLDM Docs] Update docstring by sanchit-gandhi in 4744
* fix dummy import for AudioLDM2 by patil-suraj in 4741
* change validation scheduler for train_dreambooth.py when training IF by wyz894272237 in 4333
* add a step_index counter by yiyixuxu in 4347
* [AudioLDM2] Doc fixes by sanchit-gandhi in 4739
* Bugfix for SDXL model loading in low ram system. by Symbiomatrix in 4628
* Clean up flaky behaviour on Slow CUDA Pytorch Push Tests by DN6 in 4759
* [Tests] Fix paint by example by patrickvonplaten in 4761
* [fix] multi t2i adapter set total_downscale_factor by williamberman in 4621
* [Examples] Add madebyollin VAE to SDXL LoRA example, along with an explanation by mnslarcher in 4762
* [LoRA] relax lora loading logic by sayakpaul in 4610
* [Examples] fix sdxl dreambooth lora checkpointing. by sayakpaul in 4749
* fix sdxl_lwp empty neg_prompt error issue by xhinker in 4743
* improve setup.py by sayakpaul in 4748
* Torch device by patrickvonplaten in 4755
* [AudioLDM 2] Pipeline fixes by sanchit-gandhi in 4738
* Convert MusicLDM by sanchit-gandhi in 4579
* [WIP ] Proposal to address precision issues in CI by DN6 in 4775
* fix a bug in `from_pretrained` when load optional components by yiyixuxu in 4745
* fix bug of progress bar in clip guided images mixing by scnuhealthy in 4729
* Fixed broken link of CLIP doc in evaluation doc by mayank2 in 4760
* instance_prompt->class_prompt by williamberman in 4784
* refactor prepare_mask_and_masked_image with VaeImageProcessor by yiyixuxu in 4444
* Allow passing a checkpoint state_dict to convert_from_ckpt (instead of just a string path) by cmdr2 in 4653
* [SDXL] Add docs about forcing passed embeddings to be 0 by patrickvonplaten in 4783
* [Core] Support negative conditions in SDXL by sayakpaul in 4774
* Unet fix by canberk17 in 4769
* [Tests] Tighten up LoRA loading relaxation by sayakpaul in 4787
* [docs] Fix syntax for compel by stevhliu in 4794
* [Torch compile] Fix torch compile for controlnet by patrickvonplaten in 4795
* [SDXL Lora] Fix last ben sdxl lora by patrickvonplaten in 4797
* [LoRA Attn Processors] Refactor LoRA Attn Processors by patrickvonplaten in 4765
* Update loaders.py by chillpixelfun in 4805
* [WIP] Add Fabric by shauray8 in 4201
* Fix save_path bug in textual inversion training script by Yead in 4710
* [Examples] Save SDXL LoRA weights with chosen precision by mnslarcher in 4791
* Fix Disentangle ONNX and non-ONNX pipeline by DN6 in 4656
* fix bug in StableDiffusionXLControlNetPipeline when use guess_mode by yiyixuxu in 4799
* fix auto_pipeline: pass kwargs to load_config by yiyixuxu in 4793
* add StableDiffusionXLControlNetImg2ImgPipeline by yiyixuxu in 4592
* add models for T2I-Adapter-XL by MC-E in 4696
* Fuse loras by patrickvonplaten in 4473
* Fix convert_original_stable_diffusion_to_diffusers script by wingrime in 4817
* Support saving multiple t2i adapter models under one checkpoint by VitjanZ in 4798
* fix typo by zideliu in 4822
* VaeImageProcessor: Allow image resizing also for torch and numpy inputs by gajendr-nikhil in 4832
* [Core] refactor encode_prompt by sayakpaul in 4617
* Add loading ckpt from file for SDXL controlNet by antigp in 4683
* Fix Unfuse Lora by patrickvonplaten in 4833
* sketch inpaint from a1111 for non-inpaint models by noskill in 4824
* [docs] SDXL by stevhliu in 4428
* [Docs] improve the LoRA doc. by sayakpaul in 4838
* Fix potential type mismatch errors in SDXL pipelines by hyk1996 in 4796
* Fix image processor inputs width by echarlaix in 4853
* Remove warn with deprecate by patrickvonplaten in 4850
* [docs] ControlNet guide by stevhliu in 4640
* [SDXL Inpaint] Correct strength default by patrickvonplaten in 4858
* fix sdxl-inpaint fast test by yiyixuxu in 4859
* [docs] Add inpainting example for forcing the unmasked area to remain unchanged to the docs by dg845 in 4536
* Add GLIGEN Text Image implementation by tuanh123789 in 4777
* Test Cleanup Precision issues by DN6 in 4812
* Fix link from API to using-diffusers by pcuenca in 4856
* [Docs] Korean translation update by Snailpong in 4684
* fix a bug in sdxl-controlnet-img2img when using MultiControlNetModel by yiyixuxu in 4862
* support AutoPipeline.from_pipe between a pipeline and its ControlNet pipeline counterpart by yiyixuxu in 4861
* [WIP] masked_latent_inputs for inpainting pipeline by yiyixuxu in 4819
* [docs] DiffEdit guide by stevhliu in 4722
* [docs] Shap-E guide by stevhliu in 4700
* [ControlNet SDXL Inpainting] Support inpainting of ControlNet SDXL by harutatsuakiyama in 4694
* [Tests] Add combined pipeline tests by patrickvonplaten in 4869
* Retrieval Augmented Diffusion Models by isamu-isozaki in 3297
* check for unet_lora_layers in sdxl pipeline's save_lora_weights method by ErwannMillon in 4821
* Fix get_dummy_inputs for Stable Diffusion Inpaint Tests by dg845 in 4845
* allow passing components to connected pipelines when use the combined pipeline by yiyixuxu in 4883
* [Core] LoRA improvements pt. 3 by sayakpaul in 4842
* Add dropout parameter to UNet2DModel/UNet2DConditionModel by dg845 in 4882
* [Core] better support offloading when side loading is enabled. by sayakpaul in 4855
* Add --vae_precision option to the SDXL pix2pix script so that we have… by bghira in 4881
* [Test] Reduce CPU memory by patrickvonplaten in 4897
* fix a bug in StableDiffusionUpscalePipeline.run_safety_checker by yiyixuxu in 4886
* remove latent input for kandinsky prior_emb2emb pipeline by yiyixuxu in 4887
* [docs] Add stronger warning for SDXL height/width by stevhliu in 4867
* [Docs] add doc entry to explain lora fusion and use of different scales. by sayakpaul in 4893
* [Textual inversion] Relax loading textual inversion by patrickvonplaten in 4903
* [docs] Fix typo in Inpainting force unmasked area unchanged example by dg845 in 4910
* Würstchen model by kashif in 3849
* [InstructPix2Pix] Fix pipeline implementation and add docs by sayakpaul in 4844
* [StableDiffusionXLAdapterPipeline] add adapter_conditioning_factor by patil-suraj in 4937
* [StableDiffusionXLAdapterPipeline] allow negative micro conds by patil-suraj in 4941
* [examples] T2IAdapter training script by patil-suraj in 4934
* [Tests] add: tests for t2i adapter training. by sayakpaul in 4947
* guard save model hooks to only execute on main process by williamberman in 4929
* [Docs] add t2i adapter entry to overview of training scripts. by sayakpaul in 4946
* Temp Revert "[Core] better support offloading when side loading is enabled… by williamberman in 4927
* Revert revert and install accelerate main by williamberman in 4963
* [Docs] fix: minor formatting in the Würstchen docs by sayakpaul in 4965
* Lazy Import for Diffusers by DN6 in 4829
* [Core] Remove TF import checks by patrickvonplaten in 4968
* Make sure Flax pipelines can be loaded into PyTorch by patrickvonplaten in 4971
* Update README.md by patrickvonplaten in 4973
* Wuerstchen fixes by kashif in 4942
* Refactor model offload by patrickvonplaten in 4514
* [Bug Fix] Should pass the dtype instead of torch_dtype by zhiqiang-canva in 4917
* [Utils] Correct custom init sort by patrickvonplaten in 4967
* remove extra gligen in import by DN6 in 4987
* fix E721 Do not compare types, use `isinstance()` by kashif in 4992
* [Wuerstchen] fix combined pipeline's num_images_per_prompt by kashif in 4989
* fix image variation slow test by DN6 in 4995
* fix custom diffusion tests by DN6 in 4996
* [Lora] Speed up lora loading by patrickvonplaten in 4994
* [docs] Fix DiffusionPipeline.enable_sequential_cpu_offload docstring by dg845 in 4952
* Fix safety checker seq offload by patrickvonplaten in 4998
* Fix PR template by stevhliu in 4984
* examples fix t2i training by patrickvonplaten in 5001
Significant community contributions
The following contributors have made significant changes to the library over the last release:
* xhinker
* Add SDXL long weighted prompt pipeline (replace pr:4629) (4661)
* fix sdxl_lwp empty neg_prompt error issue (4743)
* zideliu
* Add reference_attn & reference_adain support for sdxl (4502)
* fix typo (4822)
* shauray8
* [WIP] Add Fabric (4201)
* MC-E
* add models for T2I-Adapter-XL (4696)
* tuanh123789
* Add GLIGEN Text Image implementation (4777)
* Snailpong
* [Docs] Korean translation update (4684)
* harutatsuakiyama
* [ControlNet SDXL Inpainting] Support inpainting of ControlNet SDXL (4694)
* isamu-isozaki
* Retrieval Augmented Diffusion Models (3297)