SDXL Export and Inference
Optimum CLI now supports compiling components in the SDXL pipeline for inference on neuron devices (inf2/trn1).
Below is an example of compiling SDXL models. You can either compile it with an inf2 instance (`inf2.8xlarge` or larger recommended) or a CPU-only instance (disable the validation with `--disable-validation`) :
bash
optimum-cli export neuron --model stabilityai/stable-diffusion-xl-base-1.0 --task stable-diffusion-xl --batch_size 1 --height 1024 --width 1024 --auto_cast matmul --auto_cast_type bf16 sdxl_neuron/
And then run inference with the class `NeuronStableDiffusionXLPipeline`
python
from optimum.neuron import NeuronStableDiffusionXLPipeline
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
stable_diffusion_xl = NeuronStableDiffusionXLPipeline.from_pretrained(
model_id="sdxl_neuron/", device_ids=[0, 1]
)
image = stable_diffusion_xl(prompt).images[0]
* Add sdxl exporter support by JingyaHuang in https://github.com/huggingface/optimum-neuron/pull/203
* Add Stable Diffusion XL inference support by JingyaHuang in https://github.com/huggingface/optimum-neuron/pull/212
Llama v1, v2 Inference
* Add support for Llama inference through NeuronModelForCausalLM by dacorvo in https://github.com/huggingface/optimum-neuron/pull/223
Llama v2 Training
* Llama V2 training support by michaelbenayoun in https://github.com/huggingface/optimum-neuron/pull/211
* LLama V1 training fix by michaelbenayoun in 211
TGI
* AWS Inferentia2 TGI server by dacorvo in https://github.com/huggingface/optimum-neuron/pull/214
Major bugfixes
* `neuron_parallel_compile`, `ParallelLoader` and Zero-1 fixes for torchneuron 8+ by michaelbenayoun in https://github.com/huggingface/optimum-neuron/pull/200
* flan-t5 fix: `T5Parallelizer`, `NeuronCacheCallback` and `NeuronHash` refactors by michaelbenayoun in https://github.com/huggingface/optimum-neuron/pull/207
* Fix optimum-cli broke by optimum 1.13.0 release by JingyaHuang in https://github.com/huggingface/optimum-neuron/pull/217
Other changes
* Bump Inference APIs to Neuron 2.13 by JingyaHuang in https://github.com/huggingface/optimum-neuron/pull/206
* Add log for SD when applying optim attn & pipelines lazy loading by JingyaHuang in https://github.com/huggingface/optimum-neuron/pull/208
* Cancel concurreny CIs for inference by JingyaHuang in https://github.com/huggingface/optimum-neuron/pull/218
* fix(tgi): typer does not support Union types by dacorvo in https://github.com/huggingface/optimum-neuron/pull/219
* Bump neuron-cc version to 1.18.* by JingyaHuang in https://github.com/huggingface/optimum-neuron/pull/224
**Full Changelog**: https://github.com/huggingface/optimum-neuron/compare/v0.0.10...v0.0.11