Faster Text Generation
0.6.1 provides significant speed-ups of up to 9x for Whisper and BART text generation! We have put the entire text generation loop onto IPU and enabled KV caching for self-attention layers.
* Use buffers to cache the encoder hidden states in decoder wrapper by jimypbr in https://github.com/huggingface/optimum-graphcore/pull/285
* Move whisper decoder projection to IPU 0 since there is weight tying by katalinic-gc in https://github.com/huggingface/optimum-graphcore/pull/309
* move the IndexedInputLinear out of the decoder wrapper by katalinic-gc in https://github.com/huggingface/optimum-graphcore/pull/319
* Add generic KV caching support, use it with Whisper by katalinic-gc in https://github.com/huggingface/optimum-graphcore/pull/307
* On device text generation POC for greedy search by katalinic-gc in https://github.com/huggingface/optimum-graphcore/pull/357
* Add on device beam search by katalinic-gc in https://github.com/huggingface/optimum-graphcore/pull/370
* Add attention serialization to the attention mixin and enable it with Whisper by katalinic-gc in https://github.com/huggingface/optimum-graphcore/pull/372
* BART KV-caching + on-device by jimypbr in https://github.com/huggingface/optimum-graphcore/pull/363
* Fix cached_beam_idx check for non on device generation by katalinic-gc in https://github.com/huggingface/optimum-graphcore/pull/378
* Attn mixin improvements by katalinic-gc in https://github.com/huggingface/optimum-graphcore/pull/381
* Add a faster torch based version of the whisper feature extractor by katalinic-gc in https://github.com/huggingface/optimum-graphcore/pull/376
* Fix BART Positional embeddings for generation without caching by jimypbr in https://github.com/huggingface/optimum-graphcore/pull/386
New Models
Fine-tuning of text generation model
Text generation with `IPUSeq2SeqTrainer` is now enabled.
* Fix IPUSeq2SeqTrainer for models that have persistent buffers by kundaMwiza in https://github.com/huggingface/optimum-graphcore/pull/337
* Enable generation in notebooks that use IPUSeq2SeqTrainer by kundaMwiza in https://github.com/huggingface/optimum-graphcore/pull/341
* Fix IPUSeq2SeqTrainer for models that have persistent buffers by kundaMwiza in https://github.com/huggingface/optimum-graphcore/pull/337
* Enable generation in notebooks that use IPUSeq2SeqTrainer by kundaMwiza in https://github.com/huggingface/optimum-graphcore/pull/341
* Fix: reparallelize for training after generation by kundaMwiza in https://github.com/huggingface/optimum-graphcore/pull/387
Wav2vec2 Large
* Adding Wav2vec2 Large pretraining and fine-tuning by atsyplikhin in https://github.com/huggingface/optimum-graphcore/pull/323
Flan-T5
Added support for Flan-T5 inference. This comes with numerical fixes to T5 for running in float16.
* Enable Flan-T5 inference in `float16` by HMellor in https://github.com/huggingface/optimum-graphcore/pull/296
* Add Flan-T5 notebook by HMellor in https://github.com/huggingface/optimum-graphcore/pull/318
* T5 revert fp16 clamping removal by kundaMwiza in https://github.com/huggingface/optimum-graphcore/pull/332
* Skip equal check for denormals in known T5 layer by HMellor in https://github.com/huggingface/optimum-graphcore/pull/383
MT5
Added MT5 model, `MT5ForConditionalGeneration`. To support this, two new options were added to `IPUConfig`:
* `serialized_projection_splits_per_ipu`: (`List[int]`, *optional*, defaults to None):
Specifies the number of splits of the embedding layer that will be put on each IPU for pipelined execution.
The format has to be the same as that for `layers_per_ipu` however wildcards are not supported.
For instance: `[3, 1, 0, 0]` specifies how to place an embedding layer serialized into
4 sub-embedding layers across a 4-IPU pipeline. IPU-1 has 3 splits and IPU-2 has 1 split.
* `projection_serialization_factor`: (`int`, *optional*, defaults to 1 if `serialized_projection_splits_per_ipu` is `None`):
The factor to use to either serialize the matmuls that are performed in the linear projection layer, or,
serialize the projection layer into a set of individual linear layers that can be optionally placed on different IPUs.
Nothing happens if `projection_serialization_factor = 1`.
PRs:
* Support sharding serialized layers across ipus by kundaMwiza in https://github.com/huggingface/optimum-graphcore/pull/355
* Add MT5 model and fine-tuning notebook by kundaMwiza in https://github.com/huggingface/optimum-graphcore/pull/392
HubertForCTC
* Add support for HubertForCTC by jimypbr in https://github.com/huggingface/optimum-graphcore/pull/347
* Change hyper-parameters to fix Hubert for CTC CI by jimypbr in https://github.com/huggingface/optimum-graphcore/pull/390
User Experience
The `pod_type` argument to `IPUTrainingArguments` has now been deprecated and replaced by `n_ipu`. Consequently, `pod_type` dictionary values of `IPUConfig` are no longer supported.
* Pod type sets replication factor by rahult-graphcore in https://github.com/huggingface/optimum-graphcore/pull/271
`IPUConfig` now supports `inference_` versions of the parameters:
* `layers_per_ipu`
* `ipus_per_replica`
* `matmul_proportion`
* `serialized_embedding_splits_per_ipu`
* `projection_serialization_factor`
* `serialized_projection_splits_per_ipu`
PRs:
* Enable training and inference specific configurations using a single `IPUConfig` by HMellor in https://github.com/huggingface/optimum-graphcore/pull/308
* Matmul proportion support float or len(List[float]) == ipus_per_replica by kundaMwiza in https://github.com/huggingface/optimum-graphcore/pull/375
* Refactor: prefix IPUConfig `ManagedAttribute`s instead of overloading user provided attributes by kundaMwiza in https://github.com/huggingface/optimum-graphcore/pull/366
* Add attribute validation by kundaMwiza in https://github.com/huggingface/optimum-graphcore/pull/371
* Refactor SerializedEmbedding to use to/from_model by kundaMwiza in https://github.com/huggingface/optimum-graphcore/pull/382
Notebooks
* Add narrative to the whisper notebook by payoto in https://github.com/huggingface/optimum-graphcore/pull/312
* Add Flan-T5 notebook by HMellor in https://github.com/huggingface/optimum-graphcore/pull/318
* Deberta notebook to accompany blog post by lukem-gc in https://github.com/huggingface/optimum-graphcore/pull/369
* Add MT5 model and fine-tuning notebook by kundaMwiza in https://github.com/huggingface/optimum-graphcore/pull/392
New Contributors
* atsyplikhin made their first contribution in https://github.com/huggingface/optimum-graphcore/pull/323
* lukem-gc made their first contribution in https://github.com/huggingface/optimum-graphcore/pull/369
**Full Changelog**: https://github.com/huggingface/optimum-graphcore/compare/v0.6.0...v0.6.1