π Faster
We have [thoroughly profiled](https://twitter.com/nouamanetazi/status/1576959648912973826?s=21&t=DaKkhPZ5zn1IVJooMJ5J3A) our codebase and applied a number of incremental improvements that, when combined, provide a speed improvement of almost **3x**.
On top of that, we now default to using the `float16` format. It's much faster than `float32` and, according to our tests, produces images with no discernible difference in quality. This beats the use of `autocast`, so the resulting code is cleaner!
π `use_auth_token` no more
The recently released version of `huggingface-hub` automatically uses your access token if you are logged in, so you don't need to put it everywhere in your code. All you need to do is authenticate once using `huggingface-cli login` in your terminal and you're all set.
diff
- pipe = DiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", use_auth_token=True)
+ pipe = DiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
We bumped `huggingface-hub` version to `0.10.0` in our dependencies to achieve this.
πMore flexible APIs
- Schedulers now use a common, simpler unified API design. This has allowed us to remove many conditionals and special cases in the rest of the code, including the pipelines. This is very important for us and for the users of 𧨠diffusers: we all gain clarity and a solid abstraction for schedulers. See the description in https://github.com/huggingface/diffusers/pull/719 for more details
Please update any custom Stable Diffusion pipelines accordingly:
diff
- if isinstance(self.scheduler, LMSDiscreteScheduler):
- latents = latents * self.scheduler.sigmas[0]
+ latents = latents * self.scheduler.init_noise_sigma
diff
- if isinstance(self.scheduler, LMSDiscreteScheduler):
- sigma = self.scheduler.sigmas[i]
- latent_model_input = latent_model_input / ((sigma**2 + 1) ** 0.5)
+ latent_model_input = self.scheduler.scale_model_input(latent_model_input, t)
diff
- if isinstance(self.scheduler, LMSDiscreteScheduler):
- latents = self.scheduler.step(noise_pred, i, latents, **extra_step_kwargs).prev_sample
- else:
- latents = self.scheduler.step(noise_pred, t, latents, **extra_step_kwargs).prev_sample
+ latents = self.scheduler.step(noise_pred, t, latents, **extra_step_kwargs).prev_sample
- Pipeline callbacks. As a community project (h/t jamestiotio!), `diffusers` pipelines can now invoke a callback function during generation, providing the latents at each step of the process. This makes it easier to perform tasks such as visualization, inspection, explainability and others the community may invent.
π οΈ More tasks
Building on top of the previous foundations, this release incorporates several new tasks that have been adapted from research papers or community projects. These include:
- **Textual inversion**. Makes it possible to quickly train a new concept or style and incorporate it into the vocabulary of Stable Diffusion. Hundreds of people have already created theirs, and they can be shared and combined together. See the [training Colab](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb) to get started.
- **Dreambooth**. Similar goal to textual inversion, but instead of creating a new item in the vocabulary it fine-tunes the model to make it learn a new concept. [Training Colab](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwihuLCU-Mv6AhUw5IUKHcffDLAQFnoECAgQAQ&url=https%3A%2F%2Fcolab.research.google.com%2Fgithub%2Fhuggingface%2Fnotebooks%2Fblob%2Fmain%2Fdiffusers%2Fsd_dreambooth_training.ipynb&usg=AOvVaw1fUDNVS7RbV0iMYAMDWf_d).
- **Negative prompts**. Another community effort led by shirayu. The Stable Diffusion pipeline can now receive both a positive prompt (the one you want to create), and a negative prompt (something you want to drive the model away from). This opens up a lot of creative possibilities!
πββοΈ Under the hood changes to support better fine-tuning
Gradient checkpointing and 8-bit optimizers have been successfully applied to achieve Dreambooth fine-tuning in a Colab notebook! These updates will make it easier for `diffusers` to support general-purpose fine-tuning (coming soon!).
β οΈ Experimental: community pipelines
This is big, but it's still an experimental feature that may change in the future.
We are constantly amazed at the amount of imagination and creativity in the `diffusers` community, so we've made it easy to create custom pipelines and share them with others. You can write your own pipeline code, store it in π€ Hub, GitHub or your local filesystem and `StableDiffusionPipeline.from_pretrained` will be able to load and run it. Read more in [the documentation](https://huggingface.co/docs/diffusers/main/en/using-diffusers/custom_pipelines).
We can't wait to see what new tasks the community creates!
πͺ Quality of life fixes
Bug fixing, improved documentation, better tests are all important to ensure `diffusers` is a high-quality codebase, and we always spend a lot of effort working on them. Several first-time contributors have helped here, and we are very grateful for their efforts!
π Significant community contributions
The following people have made significant contributions to the library over the last release:
* Victarry β Add training example for DreamBooth (554)
* jamestiotio β Add callback parameters for Stable Diffusion pipelines (521)
* jachiam β Allow resolutions that are not multiples of 64 (505)
* johnowhitaker β Adding pred_original_sample to SchedulerOutput for some samplers (614).
* keturn β Interesting discussions and insights on many topics.
βοΈ Change list
* [Docs] Correct links by patrickvonplaten in 432
* [Black] Update black by patrickvonplaten in 433
* use torch.matmul instead of einsum in attnetion. by patil-suraj in 445
* Renamed variables from single letter to better naming by daspartho in 449
* Docs: fix installation typo by daspartho in 453
* fix table formatting for stable diffusion pipeline doc (add blank line) by natolambert in 471
* update expected results of slow tests by kashif in 268
* [Flax] Make room for more frameworks by patrickvonplaten in 494
* Fix `disable_attention_slicing` in pipelines by pcuenca in 498
* Rename test_scheduler_outputs_equivalence in model tests. by pcuenca in 451
* Scheduler docs update by natolambert in 464
* Fix scheduler inference steps error with power of 3 by natolambert in 466
* initial flax pndm schedular by kashif in 492
* Fix vae tests for cpu and gpu by kashif in 480
* [Docs] Add subfolder docs by patrickvonplaten in 500
* docs: bocken doc links for relative links by jjmachan in 504
* Removing `.float()` (`autocast` in fp16 will discard this (I think)). by Narsil in 495
* Fix MPS scheduler indexing when using `mps` by pcuenca in 450
* [CrossAttention] add different method for sliced attention by patil-suraj in 446
* Implement `FlaxModelMixin` by mishig25 in 493
* Karras VE, DDIM and DDPM flax schedulers by kashif in 508
* [UNet2DConditionModel, UNet2DModel] pass norm_num_groups to all the blocks by patil-suraj in 442
* Add `init_weights` method to `FlaxMixin` by mishig25 in 513
* UNet Flax with FlaxModelMixin by pcuenca in 502
* Stable diffusion text2img conversion script. by patil-suraj in 154
* [CI] Add stalebot by anton-l in 481
* Fix is_onnx_available by SkyTNT in 440
* [Tests] Test attention.py by sidthekidder in 368
* Finally fix the image-based SD tests by anton-l in 509
* Remove the usage of numpy in up/down sample_2d by ydshieh in 503
* Fix typos and add Typo check GitHub Action by shirayu in 483
* Quick fix for the img2img tests by anton-l in 530
* [Tests] Fix spatial transformer tests on GPU by anton-l in 531
* [StableDiffusionInpaintPipeline] accept tensors for init and mask image by patil-suraj in 439
* adding more typehints to DDIM scheduler by vishnu-anirudh in 456
* Revert "adding more typehints to DDIM scheduler" by patrickvonplaten in 533
* Add LMSDiscreteSchedulerTest by sidthekidder in 467
* [Download] Smart downloading by patrickvonplaten in 512
* [Hub] Update hub version by patrickvonplaten in 538
* Unify offset configuration in DDIM and PNDM schedulers by jonatanklosko in 479
* [Configuration] Better logging by patrickvonplaten in 545
* `make fixup` support by younesbelkada in 546
* FlaxUNet2DConditionOutput flax.struct.dataclass by mishig25 in 550
* [Flax] fix Flax scheduler by kashif in 564
* JAX/Flax safety checker by pcuenca in 558
* Flax: ignore dtype for configuration by pcuenca in 565
* Remove check_tf_utils to avoid an unnecessary TF import for now by anton-l in 566
* Fix `_upsample_2d` by ydshieh in 535
* [Flax] Add Vae for Stable Diffusion by patrickvonplaten in 555
* [Flax] Solve problem with VAE by patrickvonplaten in 574
* [Tests] Upload custom test artifacts by anton-l in 572
* [Tests] Mark the ncsnpp model tests as slow by anton-l in 575
* [examples/community] add CLIPGuidedStableDiffusion by patil-suraj in 561
* Fix `CrossAttention._sliced_attention` by ydshieh in 563
* Fix typos by shirayu in 568
* Add `from_pt` argument in `.from_pretrained` by younesbelkada in 527
* [FlaxAutoencoderKL] rename weights to align with PT by patil-suraj in 584
* Fix BaseOutput initialization from dict by anton-l in 570
* Add the K-LMS scheduler to the inpainting pipeline + tests by anton-l in 587
* [flax safety checker] Use `FlaxPreTrainedModel` for saving/loading by patil-suraj in 591
* FlaxDiffusionPipeline & FlaxStableDiffusionPipeline by mishig25 in 559
* [Flax] Fix unet and ddim scheduler by patrickvonplaten in 594
* Fix params replication when using the dummy checker by pcuenca in 602
* Allow dtype to be specified in Flax pipeline by pcuenca in 600
* Fix flax from_pretrained pytorch weight check by mishig25 in 603
* Mv weights name consts to diffusers.utils by mishig25 in 605
* Replace `dropout_prob` by `dropout` in `vae` by younesbelkada in 595
* Add smoke tests for the training examples by anton-l in 585
* Add torchvision to training deps by anton-l in 607
* Return Flax scheduler state by pcuenca in 601
* [ONNX] Collate the external weights, speed up loading from the hub by anton-l in 610
* docs: fix `Berkeley` ref by ryanrussell in 611
* Handle the PIL.Image.Resampling deprecation by anton-l in 588
* Make flax from_pretrained work with local subfolder by mishig25 in 608
* [flax] 'dtype' should not be part of self._internal_dict by mishig25 in 609
* [UNet2DConditionModel] add gradient checkpointing by patil-suraj in 461
* docs: fix `stochastic_karras_ve` ref by ryanrussell in 618
* Adding pred_original_sample to SchedulerOutput for some samplers by johnowhitaker in 614
* docs: `.md` readability fixups by ryanrussell in 619
* Flax documentation by younesbelkada in 589
* fix docs: change sample to images by AbdullahAlfaraj in 613
* refactor: pipelines readability improvements by ryanrussell in 622
* Allow passing session_options for ORT backend by cloudhan in 620
* Fix breaking error: "ort is not defined" by pcuenca in 626
* docs: `src/diffusers` readability improvements by ryanrussell in 629
* Fix formula for noise levels in Karras scheduler and tests by sgrigory in 627
* [CI] Fix onnxruntime installation order by anton-l in 633
* Warning for too long prompts in DiffusionPipelines (Resolve 447) by shirayu in 472
* Fix docs link to train_unconditional.py by AbdullahAlfaraj in 642
* Remove deprecated `torch_device` kwarg by pcuenca in 623
* refactor: `custom_init_isort` readability fixups by ryanrussell in 631
* Remove inappropriate docstrings in LMS docstrings. by pcuenca in 634
* Flax pipeline pndm by pcuenca in 583
* Fix `SpatialTransformer` by ydshieh in 578
* Add training example for DreamBooth. by Victarry in 554
* [Pytorch] Pytorch only schedulers by kashif in 534
* [examples/dreambooth] don't pass tensor_format to scheduler. by patil-suraj in 649
* [dreambooth] update install section by patil-suraj in 650
* [DDIM, DDPM] fix add_noise by patil-suraj in 648
* [Pytorch] add dep. warning for pytorch schedulers by kashif in 651
* [CLIPGuidedStableDiffusion] remove set_format from pipeline by patil-suraj in 653
* Fix onnx tensor format by anton-l in 654
* Fix `main`: stable diffusion pipelines cannot be loaded by pcuenca in 655
* Fix the LMS pytorch regression by anton-l in 664
* Added script to save during textual inversion training. Issue 524 by isamu-isozaki in 645
* [CLIPGuidedStableDiffusion] take the correct text embeddings by patil-suraj in 667
* Update index.mdx by tmabraham in 670
* [examples] update transfomers version by patil-suraj in 665
* [gradient checkpointing] lower tolerance for test by patil-suraj in 652
* Flax `from_pretrained`: clean up `mismatched_keys`. by pcuenca in 630
* `trained_betas` ignored in some schedulers by vishnu-anirudh in 635
* Renamed x -> hidden_states in resnet.py by daspartho in 676
* Optimize Stable Diffusion by NouamaneTazi in 371
* Allow resolutions that are not multiples of 64 by jachiam in 505
* refactor: update ldm-bert `config.json` url closes 675 by ryanrussell in 680
* [docs] fix table in fp16.mdx by NouamaneTazi in 683
* Fix slow tests by NouamaneTazi in 689
* Fix BibText citation by osanseviero in 693
* Add callback parameters for Stable Diffusion pipelines by jamestiotio in 521
* [dreambooth] fix applying clip_grad_norm_ by patil-suraj in 686
* Flax: add shape argument to `set_timesteps` by pcuenca in 690
* Fix type annotations on StableDiffusionPipeline.__call__ by tasercake in 682
* Fix import with Flax but without PyTorch by pcuenca in 688
* [Support PyTorch 1.8] Remove inference mode by patrickvonplaten in 707
* [CI] Speed up slow tests by anton-l in 708
* [Utils] Add deprecate function and move testing_utils under utils by patrickvonplaten in 659
* Checkpoint conversion script from Diffusers => Stable Diffusion (CompVis) by jachiam in 701
* [Docs] fix docstring for issue 709 by kashif in 710
* Update schedulers README.md by tmabraham in 694
* add accelerate to load models with smaller memory footprint by piEsposito in 361
* Fix typos by shirayu in 718
* Add an argument "negative_prompt" by shirayu in 549
* Fix import if PyTorch is not installed by pcuenca in 715
* Remove comments no longer appropriate by pcuenca in 716
* [train_unconditional] fix applying clip_grad_norm_ by patil-suraj in 721
* renamed x to meaningful variable in resnet.py by i-am-epic in 677
* [Tests] Add accelerate to testing by patrickvonplaten in 729
* [dreambooth] Using already created `Path` in dataset by DrInfiniteExplorer in 681
* Include CLIPTextModel parameters in conversion by kanewallmann in 695
* Avoid negative strides for tensors by shirayu in 717
* [Pytorch] pytorch only timesteps by kashif in 724
* [Scheduler design] The pragmatic approach by anton-l in 719
* Removing `autocast` for `35-25% speedup`. (`autocast` considered harmful). by Narsil in 511
* No more use_auth_token=True by patrickvonplaten in 733
* remove use_auth_token from remaining places by patil-suraj in 737
* Replace messages that have empty backquotes by pcuenca in 738
* [Docs] Advertise fp16 instead of autocast by patrickvonplaten in 740
* remove use_auth_token from for TI test by patil-suraj in 747
* allow multiple generations per prompt by patil-suraj in 741
* Add back-compatibility to LMS timesteps by anton-l in 750
* update the clip guided PR according to the new API by patil-suraj in 751
* Raise an error when moving an fp16 pipeline to CPU by anton-l in 749
* Better steps deprecation for LMS by anton-l in 753