Uform

Latest version: v3.1.1

Safety actively analyzes 693883 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 7

1.1.0

[1.1.0](https://github.com/unum-cloud/uform/compare/v1.0.3...v1.1.0) (2024-02-15)


Add

* gen2 model (66) ([37c26bc](https://github.com/unum-cloud/uform/commit/37c26bc7abf9d9dd83d8897a05ea8daf46cd2002)), closes [#66](https://github.com/unum-cloud/uform/issues/66)

1.0.3

[1.0.3](https://github.com/unum-cloud/uform/compare/v1.0.2...v1.0.3) (2023-12-29)


Improve

* basic benchmark ([042ae87](https://github.com/unum-cloud/uform/commit/042ae87b4b04671c253604d7cc3a5ba73da210d5))

1.0.2

[1.0.2](https://github.com/unum-cloud/uform/compare/v1.0.1...v1.0.2) (2023-12-28)


Make

* Deprecate Anaconda ([1ec8097](https://github.com/unum-cloud/uform/commit/1ec8097b8559b669a1e0417f5b40952d010ff53d))

1.0.0

UForm v1: Multimodal Chat in 1.5 Billion Parameters

The UForm family of tiny multimodal transformer models just got bigger! In addition to the existing CLIP-like embedding models, we now have a generative model useful for image captioning, visual question answering, and multimodal chats. All that is in just a billion parameters, small enough to fit even on mobile devices 🎉

Repository: https://github.com/unum-cloud/uform
Generative model: https://huggingface.co/unum-cloud/uform-gen
Chat model: https://huggingface.co/unum-cloud/uform-gen-chat

Evaluation Metrics

![](https://github.com/ashvardanian/usearch-images/blob/main/assets/uform-gen-preview.jpg?raw=true)

Being the smallest model of its kind, `unum-cloud/uform-gen` is hard to compare to others. Next in size are the 5x larger LLaVAs and InstructBLIP, with 7 billion parameters. LLaVA performs noticeably better on VQAv2: 78.5 vs 66.5. On captioning, CLIPScore and RefCLIPScore are relatively close across all models.

| Model | Size | Caption Length | CLIPScore | RefCLIPScore |
| :---------------------------------- | ---: | -------------: | --------: | -----------: |

0.531

| |
| `Salesforce/instructblip-vicuna-7b` | 7B | Long | 0.902 | 0.534 |
| `Salesforce/instructblip-vicuna-7b` | 7B | Short | 0.848 | 0.523 |
| |
| `unum-cloud/uform-gen` | 1.5B | Long | 0.847 | 0.523 |
| `unum-cloud/uform-gen` | 1.5B | Short | 0.842 | 0.522 |
| |
| `unum-cloud/uform-gen-chat` | 1.5B | Long | 0.860 | 0.525 |
| `unum-cloud/uform-gen-chat` | 1.5B | Short | 0.858 | 0.525 |

Throughput

On RTX 3090, using vanilla PyTorch for inference, with `bfloat16` arithmetic and greedy decoding, one should expect the following numbers for throughput.

| Model | Size | Speed | Speedup |
| :---------------------------------- | ---: | ------------------: | --------: |
| `llava-hf/llava-1.5-7b-hf` | 7B | ~ 40 tokens/second | |
| `Salesforce/instructblip-vicuna-7b` | 7B | ~ 40 tokens/second | |
| `unum-cloud/uform-gen` | 1.5B | ~ 140 tokens/second | __x 3.5__ |

0.529

Page 3 of 7

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.