Transformers

Latest version: v4.46.3

Safety actively analyzes 681775 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 30

4.46.3

One small fix for FSDP + gradient accumulation loss issue!
- FSDP grad accum fix, 34645 by winglian

4.46.2

Mostly had to finish the gradient accumulation !
Thanks to techkang and Ryukijano 🤗

- VLMs: fix number of image tokens (34332) by zucchini-nlp
- fix pixtral processor (34486) by molbap
- enable average tokens across devices (34373) by techkang and muellerzr
- Update trainer for easier handling of accumulate, compile fixes, and … by muellerzr and Ryukijano
- MPS: isin_mps_friendly can support 0D tensors (34538) by gante

4.46.1

4.46.0

New model additions

Moshi

The Moshi model was proposed in Moshi: a speech-text foundation model for real-time dialogue by Alexandre Défossez,
Laurent Mazaré, Manu Orsini, Amélie Royer, Patrick Pérez, Hervé Jégou, Edouard Grave and Neil Zeghidour.

Moshi is a speech-text foundation model that casts spoken dialogue as speech-to-speech generation. Starting from a
text language model backbone, Moshi generates speech as tokens from the residual quantizer of a neural audio codec,
while modeling separately its own speech and that of the user into parallel streams. This allows for the removal of
explicit speaker turns, and the modeling of arbitrary conversational dynamics. Moshi also predicts time-aligned text
tokens as a prefix to audio tokens. This “Inner Monologue” method significantly improves the linguistic quality of
generated speech and provides streaming speech recognition and text-to-speech. As a result, Moshi is the first
real-time full-duplex spoken large language model, with a theoretical latency of 160ms, 200ms in practice.

![image](https://github.com/user-attachments/assets/00ed5bcc-47b2-4b73-a8f1-2aa0a2e12b32)


* Moshi integration by ylacombe in 33624

Zamba

Zamba-7B-v1 is a hybrid between state-space models (Specifically Mamba) and transformer, and was trained using
next-token prediction. Zamba uses a shared transformer layer after every 6 mamba blocks. It uses the Mistral

4.45.2

Mostly some warnings that were not properly removed ⚠️ :
* Ignore keys on validate_rope 33753 by zucchini-nlp
* remove warning v2 33761 by itazap
* Config: lower save_pretrained exception to warning 33906 by gante

🔴 Had a small regression with dynamic Cache 🔴
*Cache: revert DynamicCache init for BC 33861 by gante

A small fix for idefic 🐩 :
* Fixes for issue 33763 in idefics2 model 33766 by aroun-coumar

And a fix for `Siglip` 🤧 !
* hot fix self.position_embeddings->self.position_embedding 33958 and properly fix and RUN_SLOW 33965 thanks to mranzinger

4.45.1

* [MllamaProcessor] Update errors and API with multiple image (33715) by ArthurZucker
* Generate: can_generate() recursive check (33718) by gante
* clean_up_tokenization_spaces=False if unset (31938) by itazap

Page 1 of 30

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.