Optimum-neuron

Latest version: v0.0.28

Safety actively analyzes 706267 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 5 of 5

0.0.6

Introduces fix for 109 (113)

0.0.5

NeuronModel classes

NeuronModel classes allow you to run inference on `Inf1` and `Inf2` instances while preserving the python interface you are used to from [Transformers' auto model classses](https://huggingface.co/docs/transformers/v4.30.0/en/model_doc/auto#generic-model-classes).

Example:

python
from transformers import AutoTokenizer
from optimum.neuron import NeuronModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained(
"optimum/distilbert-base-uncased-finetuned-sst-2-english-neuronx"
)
model = NeuronModelForSequenceClassification.from_pretrained(
"optimum/distilbert-base-uncased-finetuned-sst-2-english-neuronx"
)

inputs = tokenizer("Hamilton is considered to be the best musical of human history.", return_tensors="pt")

outputs = model(**inputs)


Supported tasks are:

- Feature extraction
- Masked language modeling
- Text classification
- Token classification
- Question answering
- Multiple choice

Relevant PR: 45

Generation methods

Two generation methods are now supported:

- Greedy decoding (70)
- Beam search (93)

This allows you to perform evaluation with generation during decoder and seq2seq models training.

Misc

The Optimum CLI now provides two new commands to help managing the cache:

- `optimum-cli neuron cache list`: To list a remote cache repo on the Hugging Face Hub (85)
- `optimum-cli neuron cache add`: To add compilation files related to a model to a remote cache repo on the Hugging Face Hub (51)

0.0.4

`optimum-cli neuron cache` command line

The `optimum-cli` now provides two commands to work with the Trainium cache:

- Cache creation:

optimum-cli neuron cache create


- Cache setting:

optimum-cli neuron set


Documentation

- New Trainium model cache documentation [page](https://huggingface.co/docs/optimum-neuron/guides/cache_system)

0.0.3

Pins the version of the `huggingface_hub` library to be greater or equal to `0.14.0`.
Should fix errors related to 41.

0.0.2

Compilation caching system

Since compiling models before being able to train them can be a real bottleneck (for example on small datasets, compile-time is longer than training-time), we introduce a caching system directly connected to the Hugging Face Hub.

Before starting compilation, the `TrainiumTrainer` checks if the needed compile files are on the Hub, and fetched them if that is the case, saving the user the need to do that himself.

Custom cache repo

Since each user might want to have its own cache repo to be able to push stuff and/or keep things private, we offer the possibility to do so via CUSTOM_CACHE_REPO environment variable:

bash
CUSTOM_CACHE_REPO=michaelbenayoun/cache_test python train.py


Neuron export

Support exporting PyTorch models to serialized TorchScript Module compiled by Neuron Compiler ([`neuron-cc`](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/neuron-cc/command-line-reference.html#neuron-compiler-cli-reference) or [`neuronx-cc`](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/neuronx-cc/api-reference-guide/neuron-compiler-cli-reference-guide.html#neuron-compiler-cli-reference-guide)) that can be used on AWS [INF2](https://aws.amazon.com/ec2/instance-types/inf2/) or [INF1](https://aws.amazon.com/ec2/instance-types/inf1/).

Example: Export the BERT model with static shapes:


optimum-cli export neuron --help
optimum-cli export neuron --model bert-base-uncased --sequence_length 128 --batch_size 16 bert_neuron/


By default, on INF2, `matmul` operations will be cast from `fp32` to `bf16`. And on INF1, all operations will be cast to `bf16`. Using `--auto_cast` to configure which operations to perform auto-casting and using `--auto_cast_type` to define the data type for auto-casting.

Example: Auto-cast __all__ operations (*this option can potentially lower precision/accuracy*) to `fp16` data type:

optimum-cli export neuron --model bert-base-uncased --auto_cast all --auto_cast_type fp16 bert_neuron/

0.0.1

The following architectures can be trained on [AWS Trainium instances](https://aws.amazon.com/fr/ec2/instance-types/trn1/) (trn1.2xlarge and trn1.32xlarge) :

- ALBERT
- BERT
- DistilBERT
- RoBERTa
- XLM-RoBERTa
- CamemBERT
- Electra
- GPT-2
- GPT-Neo
- MarianMT
- T5
- BART
- ViT

Training examples for many tasks are provided [here](https://github.com/huggingface/optimum-neuron/tree/v0.0.1-release/examples).

Page 5 of 5

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.