Diffusers

Latest version: v0.31.0

Safety actively analyzes 679296 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 16 of 16

0.1.2

These are the release notes of the 🧨 Diffusers library

Introducing Hugging Face's new library for diffusion models.

Diffusion models proved themselves very effective in artificial synthesis, even beating GANs for images. Because of that, they gained traction in the machine learning community and play an important role for systems like DALL-E 2 or Imagen to generate photorealistic images when prompted on text.

While the most prolific successes of diffusion models have been in the **computer vision** community, these models have also achieved remarkable results in other domains, such as:

- [video generation](https://video-diffusion.github.io/),
- [audio synthesis](https://diffwave-demo.github.io/),
- [reinforcement learning](https://diffusion-planning.github.io/),

and more.

Goals

The goals of diffusers are:

- to centralize the research of diffusion models from independent repositories to a clear and maintained project,
- to reproduce high impact machine learning systems such as DALLE and Imagen in a manner that is accessible for the public, and
- to create an easy to use API that enables one to train their own models or re-use checkpoints from other repositories for inference.

Release overview

***Quickstart***:
- For a light walk-through of the library, please have a look at the [Official 🧨 Diffusers Notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/diffusers_intro.ipynb).
- To directly jump into training a diffusion model yourself, please have a look at the [Training Diffusers Notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb)

Diffusers aims to be a modular toolbox for diffusion techniques, with a focus the following categories:

:bullettrain_side: Inference pipelines

Inference pipelines are a collection of end-to-end diffusion systems that can be used out-of-the-box. The goal is for them to stick as close as possible to their original implementation, and they can include components of other libraries (such as text encoders).

The original release contains the following pipelines:

- [DDPM](https://arxiv.org/abs/2006.11239) for unconditional image generation with discrete scheduling in [pipeline_ddpm](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_ddpm.py).
- [DDIM](https://arxiv.org/abs/2010.02502) for unconditional image generation with discrete scheduling in [pipeline_ddim](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_ddim.py).
- [PNDM](https://arxiv.org/abs/2202.09778) for unconditional image generation with discrete scheduling in [pipeline_pndm](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_pndm.py).
- [Stochastic Differential Equations](https://openreview.net/forum?id=PxTIG12RRHS) for unconditional image generation with continuous scheduling in [score_sde_ve](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/score_sde_ve/pipeline_score_sde_ve.py)
- [Latent diffusion](https://arxiv.org/abs/2112.10752) for text to image generation / conditional image generation in [pipeline_latent_diffusion](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_latent_diffusion.py) as well as for unconditional image generation in [latent_diffusion_uncond](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/latent_diffusion_uncond)

We are currently working on enabling other pipelines for different modalities. The following pipelines are expected to land in a subsequent release:

- BDDMPipeline for spectrogram-to-sound vocoding
- GLIDEPipeline to support OpenAI's [GLIDE model](https://github.com/openai/glide-text2im)
- Grad-TTS for text to audio generation / conditional audio generation
- A reinforcement learning pipeline (happening in https://github.com/huggingface/diffusers/pull/105)

:alarm_clock: Schedulers

- Schedulers are the algorithms to use diffusion models in inference as well as for training. They include the noise schedules and define algorithm-specific diffusion steps.
- Schedulers can be used interchangable between diffusion models in inference to find the preferred tradef-off between speed and generation quality.
- Schedulers are available in numpy, but can easily be transformed into PyTorch.

The goal is for each scheduler to provide one or more `step()` functions that should be called iteratively to unroll the diffusion loop during the forward pass. They are framework agnostic, but offer conversion methods which should allow easy conversion to PyTorch utilities.

The initial release contains the following schedulers:

- [DDIM](https://github.com/huggingface/diffusers/blob/main/src/diffusers/schedulers/scheduling_ddim.py), from the [Denoising Diffusion Implicit Models](https://arxiv.org/abs/2010.02502) paper.
- [DDPM](https://github.com/huggingface/diffusers/blob/main/src/diffusers/schedulers/scheduling_ddpm.py), from the [Denoising Diffusion Probabilistic Models](https://arxiv.org/abs/2006.11239) paper.
- [PNDM](https://github.com/huggingface/diffusers/blob/main/src/diffusers/schedulers/scheduling_pndm.py), from the [Pseudo Numerical Methods for Diffusion Models on Manifolds](https://arxiv.org/abs/2202.09778) paper
- [SDE_VE](https://github.com/huggingface/diffusers/blob/main/src/diffusers/schedulers/scheduling_sde_ve.py), from the [Score-Based Generative Modeling through Stochastic Differential Equations](https://openreview.net/forum?id=PxTIG12RRHS) paper.

:factory: Models

Models are hosted in the `src/diffusers/models` [folder](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models).

For the initial release, you'll get to see a few building blocks, as well as some resulting models:

- `UNet2DModel` can be seen as a version of the recent UNet architectures as shown in recent papers. It can be seen as the *unconditional* version of the UNet model, in opposition to the *conditional* version that follows below.
- `UNet2DConditionModel` is similar to the `UNet2DModel`, but is *conditional*: it uses the cross-attention mechanism in order to have skip connections in its downsample and upsample layers. These cross-attentions can be fed by other models. An example of a pipeline using a conditional UNet model is the latent diffusion pipeline.
- `AutoencoderKL` and `VQModel` are still experimental models that are prone to breaking changes in the near future. However, they can already be used as part of the Latent Diffusion pipelines.

:page_with_curl: Training example

The first release contains a dataset-agnostic unconditional example and a training notebook:

- The [`train_unconditional.py`](https://github.com/huggingface/diffusers/blob/main/examples/train_unconditional.py) example, which trains a DDPM UNet model on a dataset of your choice.
- More examples can be found under the [Hugging Face Diffusers Notebooks](https://github.com/huggingface/notebooks/tree/main/diffusers#diffusers-notebooks)

Credits

This library concretizes previous work by many different authors and would not have been possible without their great research and implementations. We'd like to thank, in particular, the following implementations which have helped us in our development and without which the API could not have been as polished today:

- CompVis' latent diffusion models library, available [here](https://github.com/CompVis/latent-diffusion)
- hojonathanho original DDPM implementation, available [here](https://github.com/hojonathanho/diffusion) as well as the extremely useful translation into PyTorch by pesser, available [here](https://github.com/pesser/pytorch_diffusion)
- ermongroup's DDIM implementation, available [here](https://github.com/ermongroup/ddim).
- yang-song's Score-VE and Score-VP implementations, available [here](https://github.com/yang-song/score_sde_pytorch)

We also want to thank heejkoo for the very helpful overview of papers, code and resources on diffusion models, available [here](https://github.com/heejkoo/Awesome-Diffusion-Models).

Page 16 of 16

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.