Diffusers

Latest version: v0.29.1

Safety actively analyzes 640974 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 14 of 14

0.2.3

:art: Stable Diffusion public release

The Stable Diffusion checkpoints are now public and can be loaded by anyone! :partying_face:

**Make sure to accept the license terms on the model page first (requires login): https://huggingface.co/CompVis/stable-diffusion-v1-4**
Install the required packages: `pip install diffusers==0.2.3 transformers scipy`
And log in on your machine using the `huggingface-cli login` command.

python
from torch import autocast
from diffusers import StableDiffusionPipeline, LMSDiscreteScheduler

this will substitute the default PNDM scheduler for K-LMS
lms = LMSDiscreteScheduler(
beta_start=0.00085,
beta_end=0.012,
beta_schedule="scaled_linear"
)

pipe = StableDiffusionPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4",
scheduler=lms,
use_auth_token=True
).to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"
with autocast("cuda"):
image = pipe(prompt)["sample"][0]

image.save("astronaut_rides_horse.png")


The safety checker

Following the model authors' guidelines and code, the Stable Diffusion inference results will now be filtered to exclude unsafe content. Any images classified as unsafe will be returned as blank. To check if the safety module is triggered programmaticaly, check the `nsfw_content_detected` flag like so:

python
outputs = pipe(prompt)
image = outputs
if any(outputs["nsfw_content_detected"]):
print("Potential unsafe content was detected in one or more images. Try again with a different prompt and/or seed.")


Improvements and bugfixes

* add add_noise method in LMSDiscreteScheduler, PNDMScheduler by patil-suraj in 227
* hotfix for pdnm test by natolambert in 220
* Restore `is_modelcards_available` in `.utils` by pcuenca in 224
* Update README for 0.2.3 release by pcuenca in 225
* Pipeline to device by pcuenca in 210
* fix safety check by patil-suraj in 217
* Add safety module by patil-suraj in 213
* Support one-string prompts and custom image size in LDM by anton-l in 212
* Add `is_torch_available`, `is_flax_available` by anton-l in 204
* Revive `make quality` by anton-l in 203
* [StableDiffusionPipeline] use default params in __call__ by patil-suraj in 196
* fix test_from_pretrained_hub_pass_model by patil-suraj in 194
* Match params with official Stable Diffusion lib by apolinario in 192

**Full Changelog**: https://github.com/huggingface/diffusers/compare/v0.2.2...v0.2.3

0.2.2

This patch release fixes an import of the [StableDiffusionPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py)

[K-LMS Scheduler] fix import by patrickvonplaten in [191](https://github.com/huggingface/diffusers/pull/191)

0.2.1

This patch release fixes a small bug of the [StableDiffusionPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py)

- [Stable diffusion] Hot fix by patrickvonplaten in [50a9ae](https://github.com/huggingface/diffusers/commit/b50a9ae383794f5fa56377d703e0feb80a33bf77)

0.2.0

Stable Diffusion

Stable Diffusion is a text-to-image latent diffusion model created by the researchers and engineers from [CompVis](https://github.com/CompVis), [Stability AI](https://stability.ai/) and [LAION](https://laion.ai/). It's trained on 512x512 images from a subset of the [LAION-5B](https://laion.ai/blog/laion-5b/) database. This model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM.
See the [model card](https://huggingface.co/CompVis/stable-diffusion) for more information.

**The Stable Diffusion weights are currently only available to universities, academics, research institutions and independent researchers. Please request access applying to <a href="https://stability.ai/academia-access-form" target="_blank">this</a> form**

python
from torch import autocast
from diffusers import StableDiffusionPipeline

make sure you're logged in with `huggingface-cli login`
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-3-diffusers", use_auth_token=True)

prompt = "a photograph of an astronaut riding a horse"
with autocast("cuda"):
image = pipe(prompt, guidance_scale=7)["sample"][0] image here is in PIL format

image.save(f"astronaut_rides_horse.png")


K-LMS sampling

The new `LMSDiscreteScheduler` is a port of k-lms from [k-diffusion](https://github.com/crowsonkb/k-diffusion/blob/master/k_diffusion/sampling.py) by Katherine Crowson.
The scheduler can be easily swapped into existing pipelines like so:

python
from diffusers import StableDiffusionPipeline, LMSDiscreteScheduler

model_id = "CompVis/stable-diffusion-v1-3-diffusers"
Use the K-LMS scheduler here instead
scheduler = LMSDiscreteScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", num_train_timesteps=1000)
pipe = StableDiffusionPipeline.from_pretrained(model_id, scheduler=scheduler, use_auth_token=True)


Integration test with text-to-image script of [Stable-Diffusion](https://github.com/CompVis/stable-diffusion)

182 and 186 make sure that DDIM and PNDM/PLMS scheduler yield 1-to-1 the same results as stable diffusion.
Try it out yourself:

In [Stable-Diffusion](https://github.com/CompVis/stable-diffusion):


python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --n_samples 4 --n_iter 1 --fixed_code --plms

or

python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --n_samples 4 --n_iter 1 --fixed_code


In `diffusers`:

py
from diffusers import StableDiffusionPipeline, DDIMScheduler
from time import time
from PIL import Image
from einops import rearrange
import numpy as np
import torch
from torch import autocast
from torchvision.utils import make_grid

torch.manual_seed(42)

prompt = "a photograph of an astronaut riding a horse"
prompt = "a photograph of the eiffel tower on the moon"
prompt = "an oil painting of a futuristic forest gives"

uncomment to use DDIM
scheduler = DDIMScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", clip_sample=False, set_alpha_to_one=False)
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-3-diffusers", use_auth_token=True, scheduler=scheduler) make sure you're logged in with `huggingface-cli login`

pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-3-diffusers", use_auth_token=True) make sure you're logged in with `huggingface-cli login`

all_images = []
num_rows = 1
num_columns = 4
for _ in range(num_rows):
with autocast("cuda"):
images = pipe(num_columns * [prompt], guidance_scale=7.5, output_type="np")["sample"] image here is in [PIL format](https://pillow.readthedocs.io/en/stable/)
all_images.append(torch.from_numpy(images))

additionally, save as grid
grid = torch.stack(all_images, 0)
grid = rearrange(grid, 'n b h w c -> (n b) h w c')
grid = rearrange(grid, 'n h w c -> n c h w')
grid = make_grid(grid, nrow=num_rows)

to image
grid = 255. * rearrange(grid, 'c h w -> h w c').cpu().numpy()
image = Image.fromarray(grid.astype(np.uint8))

image.save(f"./images/diffusers/{'_'.join(prompt.split())}_{round(time())}.png")



Improvements and bugfixes

* Allow passing non-default modules to pipeline by pcuenca in 188
* Add K-LMS scheduler from k-diffusion by anton-l in 185
* [Naming] correct config naming of DDIM pipeline by patrickvonplaten in 187
* [PNDM] Stable diffusion by patrickvonplaten in 186
* [Half precision] Make sure half-precision is correct by patrickvonplaten in 182
* allow custom height, width in StableDiffusionPipeline by patil-suraj in 179
* add tests for stable diffusion pipeline by patil-suraj in 178
* Stable diffusion pipeline by patil-suraj in 168
* [LDM pipeline] fix eta condition. by patil-suraj in 171
* [PNDM in LDM pipeline] use inspect in pipeline instead of unused kwargs by patil-suraj in 167
* allow pndm scheduler to be used with ldm pipeline by patil-suraj in 165
* add scaled_linear schedule in PNDM and DDPM by patil-suraj in 164
* add attention up/down blocks for VAE by patil-suraj in 161
* Add an alternative Karras et al. stochastic scheduler for VE models by anton-l in 160
* [LDMTextToImagePipeline] make text model generic by patil-suraj in 162
* Minor typos by pcuenca in 159
* Fix arg key for `dataset_name` in `create_model_card` by pcuenca in 158
* [VAE] fix the downsample block in Encoder. by patil-suraj in 156
* [UNet2DConditionModel] add cross_attention_dim as an argument by patil-suraj in 155
* Added `diffusers` to conda-forge and updated README for installation instruction by sugatoray in 129
* Add issue templates for feature requests and bug reports by osanseviero in 153
* Support training with a local image folder by anton-l in 152
* Allow DDPM scheduler to use model's predicated variance by eyalmazuz in 132

**Full Changelog**: https://github.com/huggingface/diffusers/compare/0.1.3...v0.2.0

0.1.3

This patch releases refactors the model architecture of `VQModel` or `AutoencoderKL` including the weight naming. Therefore the official weights of the `CompVis` organization have been re-uploaded, see:
- https://huggingface.co/CompVis/ldm-celebahq-256/commit/63b33cf3bbdd833de32080a8ba55ba4d0b111859
- https://huggingface.co/CompVis/ldm-celebahq-256/commit/03978f22272a3c2502da709c3940e227c9714bdd
- https://huggingface.co/CompVis/ldm-text2im-large-256/commit/31ff4edafd3ee09656d2068d05a4d5338129d592
- https://huggingface.co/CompVis/ldm-text2im-large-256/commit/9bd2b48d2d45e6deb6fb5a03eb2a601e4b95bd91

Corresponding PR: https://github.com/huggingface/diffusers/pull/137

Please make sure to upgrade `diffusers` to have those models running correctly: `pip install --upgrade diffusers`

Bug fixes
- Fix `FileNotFoundError: 'model_card_template.md'` https://github.com/huggingface/diffusers/pull/136

0.1.2

These are the release notes of the 🧨 Diffusers library

Introducing Hugging Face's new library for diffusion models.

Diffusion models proved themselves very effective in artificial synthesis, even beating GANs for images. Because of that, they gained traction in the machine learning community and play an important role for systems like DALL-E 2 or Imagen to generate photorealistic images when prompted on text.

While the most prolific successes of diffusion models have been in the **computer vision** community, these models have also achieved remarkable results in other domains, such as:

- [video generation](https://video-diffusion.github.io/),
- [audio synthesis](https://diffwave-demo.github.io/),
- [reinforcement learning](https://diffusion-planning.github.io/),

and more.

Goals

The goals of diffusers are:

- to centralize the research of diffusion models from independent repositories to a clear and maintained project,
- to reproduce high impact machine learning systems such as DALLE and Imagen in a manner that is accessible for the public, and
- to create an easy to use API that enables one to train their own models or re-use checkpoints from other repositories for inference.

Release overview

***Quickstart***:
- For a light walk-through of the library, please have a look at the [Official 🧨 Diffusers Notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/diffusers_intro.ipynb).
- To directly jump into training a diffusion model yourself, please have a look at the [Training Diffusers Notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb)

Diffusers aims to be a modular toolbox for diffusion techniques, with a focus the following categories:

:bullettrain_side: Inference pipelines

Inference pipelines are a collection of end-to-end diffusion systems that can be used out-of-the-box. The goal is for them to stick as close as possible to their original implementation, and they can include components of other libraries (such as text encoders).

The original release contains the following pipelines:

- [DDPM](https://arxiv.org/abs/2006.11239) for unconditional image generation with discrete scheduling in [pipeline_ddpm](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_ddpm.py).
- [DDIM](https://arxiv.org/abs/2010.02502) for unconditional image generation with discrete scheduling in [pipeline_ddim](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_ddim.py).
- [PNDM](https://arxiv.org/abs/2202.09778) for unconditional image generation with discrete scheduling in [pipeline_pndm](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_pndm.py).
- [Stochastic Differential Equations](https://openreview.net/forum?id=PxTIG12RRHS) for unconditional image generation with continuous scheduling in [score_sde_ve](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/score_sde_ve/pipeline_score_sde_ve.py)
- [Latent diffusion](https://arxiv.org/abs/2112.10752) for text to image generation / conditional image generation in [pipeline_latent_diffusion](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_latent_diffusion.py) as well as for unconditional image generation in [latent_diffusion_uncond](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/latent_diffusion_uncond)

We are currently working on enabling other pipelines for different modalities. The following pipelines are expected to land in a subsequent release:

- BDDMPipeline for spectrogram-to-sound vocoding
- GLIDEPipeline to support OpenAI's [GLIDE model](https://github.com/openai/glide-text2im)
- Grad-TTS for text to audio generation / conditional audio generation
- A reinforcement learning pipeline (happening in https://github.com/huggingface/diffusers/pull/105)

:alarm_clock: Schedulers

- Schedulers are the algorithms to use diffusion models in inference as well as for training. They include the noise schedules and define algorithm-specific diffusion steps.
- Schedulers can be used interchangable between diffusion models in inference to find the preferred tradef-off between speed and generation quality.
- Schedulers are available in numpy, but can easily be transformed into PyTorch.

The goal is for each scheduler to provide one or more `step()` functions that should be called iteratively to unroll the diffusion loop during the forward pass. They are framework agnostic, but offer conversion methods which should allow easy conversion to PyTorch utilities.

The initial release contains the following schedulers:

- [DDIM](https://github.com/huggingface/diffusers/blob/main/src/diffusers/schedulers/scheduling_ddim.py), from the [Denoising Diffusion Implicit Models](https://arxiv.org/abs/2010.02502) paper.
- [DDPM](https://github.com/huggingface/diffusers/blob/main/src/diffusers/schedulers/scheduling_ddpm.py), from the [Denoising Diffusion Probabilistic Models](https://arxiv.org/abs/2006.11239) paper.
- [PNDM](https://github.com/huggingface/diffusers/blob/main/src/diffusers/schedulers/scheduling_pndm.py), from the [Pseudo Numerical Methods for Diffusion Models on Manifolds](https://arxiv.org/abs/2202.09778) paper
- [SDE_VE](https://github.com/huggingface/diffusers/blob/main/src/diffusers/schedulers/scheduling_sde_ve.py), from the [Score-Based Generative Modeling through Stochastic Differential Equations](https://openreview.net/forum?id=PxTIG12RRHS) paper.

:factory: Models

Models are hosted in the `src/diffusers/models` [folder](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models).

For the initial release, you'll get to see a few building blocks, as well as some resulting models:

- `UNet2DModel` can be seen as a version of the recent UNet architectures as shown in recent papers. It can be seen as the *unconditional* version of the UNet model, in opposition to the *conditional* version that follows below.
- `UNet2DConditionModel` is similar to the `UNet2DModel`, but is *conditional*: it uses the cross-attention mechanism in order to have skip connections in its downsample and upsample layers. These cross-attentions can be fed by other models. An example of a pipeline using a conditional UNet model is the latent diffusion pipeline.
- `AutoencoderKL` and `VQModel` are still experimental models that are prone to breaking changes in the near future. However, they can already be used as part of the Latent Diffusion pipelines.

:page_with_curl: Training example

The first release contains a dataset-agnostic unconditional example and a training notebook:

- The [`train_unconditional.py`](https://github.com/huggingface/diffusers/blob/main/examples/train_unconditional.py) example, which trains a DDPM UNet model on a dataset of your choice.
- More examples can be found under the [Hugging Face Diffusers Notebooks](https://github.com/huggingface/notebooks/tree/main/diffusers#diffusers-notebooks)

Credits

This library concretizes previous work by many different authors and would not have been possible without their great research and implementations. We'd like to thank, in particular, the following implementations which have helped us in our development and without which the API could not have been as polished today:

- CompVis' latent diffusion models library, available [here](https://github.com/CompVis/latent-diffusion)
- hojonathanho original DDPM implementation, available [here](https://github.com/hojonathanho/diffusion) as well as the extremely useful translation into PyTorch by pesser, available [here](https://github.com/pesser/pytorch_diffusion)
- ermongroup's DDIM implementation, available [here](https://github.com/ermongroup/ddim).
- yang-song's Score-VE and Score-VP implementations, available [here](https://github.com/yang-song/score_sde_pytorch)

We also want to thank heejkoo for the very helpful overview of papers, code and resources on diffusion models, available [here](https://github.com/heejkoo/Awesome-Diffusion-Models).

Page 14 of 14

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.