πͺ Instruct-Pix2Pix
Instruct-Pix2Pix is a Stable Diffusion model fine-tuned for editing images from human instructions. Given an input image and a written instruction that tells the model what to do, the model follows these instructions to edit the image.
![image](https://huggingface.co/datasets/diffusers/diffusers-images-docs/resolve/main/pix2pix.jpeg)
The model was released with the paper [InstructPix2Pix: Learning to Follow Image Editing Instructions](https://arxiv.org/abs/2211.09800). More information about the model can be found in the paper.
pip install diffusers transformers safetensors accelerate
python
import PIL
import requests
import torch
from diffusers import StableDiffusionInstructPix2PixPipeline
model_id = "timbrooks/instruct-pix2pix"
pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")
url = "https://huggingface.co/datasets/diffusers/diffusers-images-docs/resolve/main/mountain.png"
def download_image(url):
image = PIL.Image.open(requests.get(url, stream=True).raw)
image = PIL.ImageOps.exif_transpose(image)
image = image.convert("RGB")
return image
image = download_image(url)
prompt = "make the mountains snowy"
edit = pipe(prompt, image=image, num_inference_steps=20, image_guidance_scale=1.5, guidance_scale=7).images[0]
images[0].save("snowy_mountains.png")
* Add InstructPix2Pix pipeline by patil-suraj 2040
π€ DiT
Diffusion Transformers (DiTs) is a class conditional latent diffusion model which replaces the commonly used U-Net backbone with a transformer operating on latent patches. The pretrained model is trained on the ImageNet-1K dataset and is able to generate class conditional images of 256x256 or 512x512 pixels.
![dit](https://user-images.githubusercontent.com/8100/214593099-3b478e53-64ca-4265-925c-50eb0ea5da3e.png)
The model was released with the paper [Scalable Diffusion Models with Transformers](https://www.wpeebles.com/DiT).
python
import torch
from diffusers import DiTPipeline
model_id = "facebook/DiT-XL-2-256"
pipe = DiTPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")
pick words that exist in ImageNet
words = ["white shark", "umbrella"]
class_ids = pipe.get_label_ids(words)
output = pipe(class_labels=class_ids)
image = output.images[0] label 'white shark'
β‘ LoRA
LoRA is a technique for performing parameter-efficient fine-tuning for large models. LoRA works by adding so-called "update matrices" to specific blocks of a pre-trained model. During fine-tuning, only these update matrices are updated while the pre-trained model parameters are kept frozen. This allows us to achieve greater memory efficiency as well as easier portability during fine-tuning.
LoRA was proposed in [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685). In the original paper, the authors investigated LoRA for fine-tuning large language models like GPT-3. [cloneofsimo](https://github.com/cloneofsimo) was the first to try out LoRA training for Stable Diffusion in the popular [lora](https://github.com/cloneofsimo/lora) GitHub repository.
Diffusers now supports [LoRA](https://arxiv.org/abs/2212.06727)! This means you can now fine-tune a model like Stable Diffusion using consumer GPUs like Tesla T4 or RTX 2080 Ti. LoRA support was added to [`UNet2DConditionModel`](https://huggingface.co/docs/diffusers/main/en/api/models#diffusers.UNet2DConditionModel) and DreamBooth training script by patrickvonplaten in 1884.
By using LoRA, the fine-tuned checkpoints will be **just 3 MBs in size**. After fine-tuning, you can use the LoRA checkpoints like so:
py
from diffusers import StableDiffusionPipeline
import torch
model_path = "sayakpaul/sd-model-finetuned-lora-t4"
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16)
pipe.unet.load_attn_procs(model_path)
pipe.to("cuda")
prompt = "A pokemon with blue eyes."
image = pipe(prompt, num_inference_steps=30, guidance_scale=7.5).images[0]
image.save("pokemon.png")
![pokemon-image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pokemon-collage.png)
You can follow these resources to know more about how to use LoRA in diffusers:
* [text2image fine-tuning script](https://github.com/huggingface/diffusers/tree/main/examples/text_to_image#training-with-lora) (by sayakpaul in 2031).
* [Official documentation discussing how LoRA is supported](https://huggingface.co/docs/diffusers/main/en/training/lora) (by sayakpaul in #2086).
π Customizable Cross Attention
LoRA leverages a new method to customize the cross attention layers deep in the UNet. This can be useful for other creative approaches such as [Prompt-to-Prompt](https://arxiv.org/abs/2208.01626), and it makes it easier to apply optimizers like [xFormers](https://github.com/facebookresearch/xformers). This new "attention processor" abstraction was created by patrickvonplaten in #1639 after discussing the design with the community, and we have used it to rewrite our xFormers and attention slicing implementations!
πΏ Flax => PyTorch
A long requested feature, prolific community member camenduru took up the gauntlet in 1900 and created a way to convert Flax model weights for PyTorch. This means that you can train or fine-tune models super fast using Google TPUs, and then convert the weights to PyTorch for everybody to use. Thanks camenduru!
π Flax Img2Img
Another community member, dhruvrnaik, ported the image-to-image pipeline to Flax in 1355! Using a TPU v2-8 (available in Colab's free tier), you can generate 8 images at once in a few seconds!
π² DEIS Scheduler
DEIS (Diffusion Exponential Integrator Sampler) is a new fast mult step scheduler that can generate high-quality samples in fewer steps.
The scheduler was introduced in the paper [Fast Sampling of Diffusion Models with Exponential Integrator](https://arxiv.org/abs/2204.13902). More information about the scheduler can be found in the paper.
python
from diffusers import StableDiffusionPipeline, DEISMultistepScheduler
import torch
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
pipe.scheduler = DEISMultistepScheduler.from_config(pipe.scheduler.config)
pipe = pipe.to("cuda")
prompt = "a photo of an astronaut riding a horse on mars"
generator = torch.Generator(device="cuda").manual_seed(0)
image = pipe(prompt, generator=generator, num_inference_steps=25).images[0
* feat : add log-rho deis multistep scheduler by qsh-zh 1432
Reproducibility
One can now pass CPU generators to all pipelines even if the pipeline is on GPU. This ensures
much better reproducibility across GPU hardware:
python
import torch
from diffusers import DDIMPipeline
import numpy as np
model_id = "google/ddpm-cifar10-32"
load model and scheduler
ddim = DDIMPipeline.from_pretrained(model_id)
ddim.to("cuda")
create a generator for reproducibility
generator = torch.manual_seed(0)
run pipeline for just two steps and return numpy tensor
image = ddim(num_inference_steps=2, output_type="np", generator=generator).images
print(np.abs(image).sum())
See: 1902 and https://huggingface.co/docs/diffusers/using-diffusers/reproducibility
Important New Guides
- Stable Diffusion 101: https://huggingface.co/docs/diffusers/stable_diffusion
- Reproducibility: https://huggingface.co/docs/diffusers/using-diffusers/reproducibility
- LoRA: https://huggingface.co/docs/diffusers/training/lora
Important Bug Fixes
- Don't download safetensors if library is not installed: 2057
- Make sure that `save_pretrained(...)` doesn't accidentally delete files: 2038
- Fix CPU offload docs for maximum memory gain: 1968
- Fix conversion for exotically sorted weight names: 1959
- Fix intermediate checkpointing for textual inversion, thanks lstein 2072
All commits
* update composable diffusion for an updated diffuser library by nanlliu in 1697
* [Tests] Fix UnCLIP cpu offload tests by anton-l in 1769
* Bump to 0.12.0.dev0 by anton-l in 1771
* [Dreambooth] flax fixes by pcuenca in 1765
* update train_unconditional_ort.py by prathikr in 1775
* Only test for xformers when enabling them 1773 by kig in 1776
* expose polynomial:power and cosine_with_restarts:num_cycles params by zetyquickly in 1737
* [Flax] Stateless schedulers, fixes and refactors by skirsten in 1661
* Correct hf hub download by patrickvonplaten in 1767
* Dreambooth docs: minor fixes by pcuenca in 1758
* Fix num images per prompt unclip by patil-suraj in 1787
* Add Flax stable diffusion img2img pipeline by dhruvrnaik in 1355
* Refactor cross attention and allow mechanism to tweak cross attention function by patrickvonplaten in 1639
* Fix OOM when using PyTorch with JAX installed. by pcuenca in 1795
* reorder model wrap + bug fix by prathikr in 1799
* Remove hardcoded names from PT scripts by patrickvonplaten in 1778
* [textual_inversion] unwrap_model text encoder before accessing weights by patil-suraj in 1816
* fix small mistake in annotation: 32 -> 64 by Line290 in 1780
* Make safety_checker optional in more pipelines by pcuenca in 1796
* Device to use (e.g. cpu, cuda:0, cuda:1, etc.) by camenduru in 1844
* Avoid duplicating PyTorch + safetensors downloads. by pcuenca in 1836
* Width was typod as weight by Helw150 in 1800
* fix: resize transform now preserves aspect ratio by parlance-zz in 1804
* Make xformers optional even if it is available by kn in 1753
* Allow selecting precision to make Dreambooth class images by kabachuha in 1832
* unCLIP image variation by williamberman in 1781
* [Community Pipeline] MagicMix by daspartho in 1839
* [Versatile Diffusion] Fix cross_attention_kwargs by patrickvonplaten in 1849
* [Dtype] Align dtype casting behavior with Transformers and Accelerate by patrickvonplaten in 1725
* [StableDiffusionInpaint] Correct test by patrickvonplaten in 1859
* [textual inversion] add gradient checkpointing and small fixes. by patil-suraj in 1848
* Flax: Fix img2img and align with other pipeline by skirsten in 1824
* Make repo structure consistent by patrickvonplaten in 1862
* [Unclip] Make sure text_embeddings & image_embeddings can directly be passed to enable interpolation tasks. by patrickvonplaten in 1858
* Fix ema decay by pcuenca in 1868
* [Docs] Improve docs by patrickvonplaten in 1870
* [examples] update loss computation by patil-suraj in 1861
* [train_text_to_image] allow using non-ema weights for training by patil-suraj in 1834
* [Attention] Finish refactor attention file by patrickvonplaten in 1879
* Fix typo in train_dreambooth_inpaint by pcuenca in 1885
* Update ONNX Pipelines to use np.float64 instead of np.float by agizmo in 1789
* [examples] misc fixes by patil-suraj in 1886
* Fixes to the help for `report_to` in training scripts by pcuenca in 1888
* updated doc for stable diffusion pipelines by yiyixuxu in 1770
* Add UnCLIPImageVariationPipeline to dummy imports by anton-l in 1897
* Add accelerate and xformers versions to `diffusers-cli env` by anton-l in 1898
* [addresses issue 1642] add add_noise to scheduling-sde-ve by aengusng8 in 1827
* Add condtional generation to AudioDiffusionPipeline by teticio in 1826
* Fixes in comments in SD2 D2I by neverix in 1903
* [Deterministic torch randn] Allow tensors to be generated on CPU by patrickvonplaten in 1902
* [Docs] Remove duplicated API doc string by patrickvonplaten in 1901
* fix: DDPMScheduler.set_timesteps() by Joqsan in 1912
* Fix --resume_from_checkpoint step in train_text_to_image.py by merfnad in 1914
* Support training SD V2 with Flax by yasyf in 1783
* Fix lr-scaling store_true & default=True cli argument for textual_inversion training. by aredden in 1090
* Various Fixes for Flax Dreambooth by yasyf in 1782
* Test ResnetBlock2D by hchings in 1850
* Init for korean docs by seriousran in 1910
* New Pipeline: Tiled-upscaling with depth perception to avoid blurry spots by peterwilli in 1615
* Improve reproduceability 2/3 by patrickvonplaten in 1906
* feat : add log-rho deis multistep scheduler by qsh-zh in 1432
* Feature/colossalai by Fazziekey in 1793
* [Docs] Add TRANSLATING.md file by seriousran in 1920
* [StableDiffusionimg2img] validating input type by Shubhamai in 1913
* [dreambooth] low precision guard by williamberman in 1916
* [Stable Diffusion Guide] 101 Stable Diffusion Guide directly into the docs by patrickvonplaten in 1927
* [Conversion] Make sure ema weights are extracted correctly by patrickvonplaten in 1937
* fix path to logo by vvssttkk in 1939
* Add automatic doc sorting by patrickvonplaten in 1940
* update to latest colossalai by Fazziekey in 1951
* fix typo in imagic_stable_diffusion.py by andreemic in 1956
* [Conversion SD] Make sure weirdly sorted keys work as well by patrickvonplaten in 1959
* allow loading ddpm models into ddim by patrickvonplaten in 1932
* [Community] Correct checkpoint merger by patrickvonplaten in 1965
* Update CLIPGuidedStableDiffusion.feature_extractor.size to fix TypeError by oxidase in 1938
* [CPU offload] correct cpu offload by patrickvonplaten in 1968
* [Docs] Update README.md by haofanwang in 1960
* Research project multi subject dreambooth by klopsahlong in 1948
* Example tests by patrickvonplaten in 1982
* Fix slow tests by patrickvonplaten in 1983
* Fix unused upcast_attn flag in convert_original_stable_diffusion_to_diffusers script by kn in 1942
* Allow converting Flax to PyTorch by adding a "from_flax" keyword by camenduru in 1900
* Update docstring by Warvito in 1971
* [SD Img2Img] resize source images to multiple of 8 instead of 32 by vvsotnikov in 1571
* Update README.md to include our blog post by sayakpaul in 1998
* Fix a couple typos in Dreambooth readme by pcuenca in 2004
* Add tests for 2D UNet blocks by hchings in 1945
* [Conversion] Support convert diffusers to safetensors by hua1995116 in 1996
* [Community] Fix merger by patrickvonplaten in 2006
* [Conversion] Improve safetensors by patrickvonplaten in 1989
* [Black] Update black library by patrickvonplaten in 2007
* Fix typos in ColossalAI example by haofanwang in 2001
* Use pipeline tests mixin for UnCLIP pipeline tests + unCLIP MPS fixes by williamberman in 1908
* Change PNDMPipeline to use PNDMScheduler by willdalh in 2003
* [train_unconditional] fix LR scheduler init by patil-suraj in 2010
* [Docs] No more autocast by patrickvonplaten in 2021
* [Flax] Add Flax inpainting impl by xvjiarui in 1966
* Check k-diffusion version is at least 0.0.12 by pcuenca in 2022
* DiT Pipeline by kashif in 1806
* fix dit doc header by patil-suraj in 2027
* [LoRA] Add LoRA training script by patrickvonplaten in 1884
* [Dit] Fix dit tests by patrickvonplaten in 2034
* Fix typos and minor redundancies by Joqsan in 2029
* [Lora] Model card by patrickvonplaten in 2032
* [Save Pretrained] Remove dead code lines that can accidentally remove pytorch files by patrickvonplaten in 2038
* Fix EMA for multi-gpu training in the unconditional example by anton-l in 1930
* Minor fix in the documentation of LoRA by hysts in 2045
* Add InstructPix2Pix pipeline by patil-suraj in 2040
* Create repo before cloning in examples by Wauplin in 2047
* Remove modelcards dependency by Wauplin in 2050
* Module-ise "original stable diffusion to diffusers" conversion script by damian0815 in 2019
* [StableDiffusionInstructPix2Pix] use cpu generator in slow tests by patil-suraj in 2051
* [From pretrained] Don't download .safetensors files if safetensors is⦠by patrickvonplaten in 2057
* Correct Pix2Pix example by patrickvonplaten in 2056
* add community pipeline: StableUnCLIPPipeline by budui in 2037
* [LoRA] Adds example on text2image fine-tuning with LoRA by sayakpaul in 2031
* Safetensors loading in "convert_diffusers_to_original_stable_diffusion" by cafeai in 2054
* [examples] add dataloader_num_workers argument by patil-suraj in 2070
* Dreambooth: reduce VRAM usage by gleb-akhmerov in 2039
* [Paint by example] Fix cpu offload for paint by example by patrickvonplaten in 2062
* [textual_inversion] Fix resuming state when using gradient checkpointing by pcuenca in 2072
* [lora] Log images when using tensorboard by pcuenca in 2078
* Fix resume epoch for all training scripts except textual_inversion by pcuenca in 2079
* [dreambooth] fix multi on gpu. by patil-suraj in 2088
* Run inference on a specific condition and fix call of manual_seed() by shirayu in 2074
* [Feat] checkpoint_merger works on local models as well as ones that use safetensors by lstein in 2060
* xFormers attention op arg by takuma104 in 2049
* [docs] [dreambooth] note random crop by williamberman in 2085
* Remove wandb from text_to_image requirements.txt by pcuenca in 2092
* [doc] update example for pix2pix by patil-suraj in 2101
* Add `lora` tag to the model tags by apolinario in 2103
* [docs] Adds a doc on LoRA support for diffusers by sayakpaul in 2086
* Allow directly passing text embeddings to Stable Diffusion Pipeline for prompt weighting by patrickvonplaten in 2071
* Improve transformers versions handling by patrickvonplaten in 2104
* Reproducibility 3/3 by patrickvonplaten in 1924
π Significant community contributions π
The following contributors have made significant changes to the library over the last release:
* nanlliu
* update composable diffusion for an updated diffuser library (1697)
* skirsten
* [Flax] Stateless schedulers, fixes and refactors (1661)
* Flax: Fix img2img and align with other pipeline (1824)
* hchings
* Test ResnetBlock2D (1850)
* Add tests for 2D UNet blocks (1945)
* seriousran
* Init for korean docs (1910)
* [Docs] Add TRANSLATING.md file (1920)
* qsh-zh
* feat : add log-rho deis multistep scheduler (1432)
* Fazziekey
* Feature/colossalai (1793)
* update to latest colossalai (1951)
* klopsahlong
* Research project multi subject dreambooth (1948)
* xvjiarui
* [Flax] Add Flax inpainting impl (1966)
* damian0815
* Module-ise "original stable diffusion to diffusers" conversion script (2019)
* camenduru
* Allow converting Flax to PyTorch by adding a "from_flax" keyword (1900)