What's new?
Support for computing CLIP image and text embeddings separately (https://github.com/xenova/transformers.js/pull/227)
You can now compute CLIP text and vision embeddings separately, allowing for faster inference when you only need to query one of the modalities. We've also released a [demo application for semantic image search](https://huggingface.co/spaces/Xenova/semantic-image-search) to showcase this functionality.
![image](https://github.com/xenova/transformers.js/assets/26504141/80c03318-6daf-4949-a114-5160f6fe0e29)
**Example:** Compute text embeddings with `CLIPTextModelWithProjection`.
javascript
import { AutoTokenizer, CLIPTextModelWithProjection } from 'xenova/transformers';
// Load tokenizer and text model
const tokenizer = await AutoTokenizer.from_pretrained('Xenova/clip-vit-base-patch16');
const text_model = await CLIPTextModelWithProjection.from_pretrained('Xenova/clip-vit-base-patch16');
// Run tokenization
let texts = ['a photo of a car', 'a photo of a football match'];
let text_inputs = tokenizer(texts, { padding: true, truncation: true });
// Compute embeddings
const { text_embeds } = await text_model(text_inputs);
// Tensor {
// dims: [ 2, 512 ],
// type: 'float32',
// data: Float32Array(1024) [ ... ],
// size: 1024
// }
**Example:** Compute vision embeddings with `CLIPVisionModelWithProjection`.
javascript
import { AutoProcessor, CLIPVisionModelWithProjection, RawImage} from 'xenova/transformers';
// Load processor and vision model
const processor = await AutoProcessor.from_pretrained('Xenova/clip-vit-base-patch16');
const vision_model = await CLIPVisionModelWithProjection.from_pretrained('Xenova/clip-vit-base-patch16');
// Read image and run processor
let image = await RawImage.read('https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/football-match.jpg');
let image_inputs = await processor(image);
// Compute embeddings
const { image_embeds } = await vision_model(image_inputs);
// Tensor {
// dims: [ 1, 512 ],
// type: 'float32',
// data: Float32Array(512) [ ... ],
// size: 512
// }
Improved browser extension example/template (https://github.com/xenova/transformers.js/pull/196)
We've updated the [source code](https://github.com/xenova/transformers.js/tree/main/examples/extension) for our example browser extension, making the following improvements:
1. Custom model caching - meaning you don't need to ship the weights of the model with the extension. In addition to a smaller bundle size, when the user updates, they won't need to redownload the weights!
2. Use ES6 module syntax (vs. CommonJS) - much cleaner code!
3. Persistent service worker - fixed an issue where the service worker would go to sleep after a portion of inactivity.
Summary of updates since last minor release (2.4.0):
* (2.4.1) Improved documentation
* (2.4.2) Support for private/gated models (https://github.com/xenova/transformers.js/pull/202)
* (2.4.3) Example Next.js applications (https://github.com/xenova/transformers.js/pull/211) + MPNet model support (https://github.com/xenova/transformers.js/pull/221)