What's new?
💬 StableLM text-generation models
This version adds support for the StableLM family of text-generation models (up to 1.6B params), developed by [Stability AI](https://huggingface.co/stabilityai). Huge thanks to D4ve-R for this contribution in https://github.com/xenova/transformers.js/pull/616! See [here](https://huggingface.co/models?library=transformers.js&other=stablelm) for the full list of supported models.
**Example:** Text generation with `Xenova/stablelm-2-zephyr-1_6b`.
js
import { pipeline } from 'xenova/transformers';
// Create text generation pipeline
const generator = await pipeline('text-generation', 'Xenova/stablelm-2-zephyr-1_6b');
// Define the prompt and list of messages
const prompt = "Tell me a funny joke."
const messages = [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": prompt },
]
// Apply chat template
const inputs = generator.tokenizer.apply_chat_template(messages, {
tokenize: false,
add_generation_prompt: true,
});
// Generate text
const output = await generator(inputs, { max_new_tokens: 20 });
console.log(output[0].generated_text);
// "<|system|>\nYou are a helpful assistant.\n<|user|>\nTell me a funny joke.\n<|assistant|>\nHere's a joke for you:\n\nWhy don't scientists trust atoms?\n\nBecause they make up everything!"
_Note: these models may be too large to run in your browser at the moment, so for now, we recommend using them in Node.js. Stay tuned for updates on this!_
🔉 Speaker verification and diarization models
**Example:** Speaker verification w/ `Xenova/wavlm-base-plus-sv`.
js
import { AutoProcessor, AutoModel, read_audio, cos_sim } from 'xenova/transformers';
// Load processor and model
const processor = await AutoProcessor.from_pretrained('Xenova/wavlm-base-plus-sv');
const model = await AutoModel.from_pretrained('Xenova/wavlm-base-plus-sv');
// Helper function to compute speaker embedding from audio URL
async function compute_embedding(url) {
const audio = await read_audio(url, 16000);
const inputs = await processor(audio);
const { embeddings } = await model(inputs);
return embeddings.data;
}
// Generate speaker embeddings
const BASE_URL = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/sv_speaker';
const speaker_1_1 = await compute_embedding(`${BASE_URL}-1_1.wav`);
const speaker_1_2 = await compute_embedding(`${BASE_URL}-1_2.wav`);
const speaker_2_1 = await compute_embedding(`${BASE_URL}-2_1.wav`);
const speaker_2_2 = await compute_embedding(`${BASE_URL}-2_2.wav`);
// Compute similarity scores
console.log(cos_sim(speaker_1_1, speaker_1_2)); // 0.959439158881247 (Both are speaker 1)
console.log(cos_sim(speaker_1_2, speaker_2_1)); // 0.618130172602329 (Different speakers)
console.log(cos_sim(speaker_2_1, speaker_2_2)); // 0.962999814169370 (Both are speaker 2)
**Example:** Perform speaker diarization with `Xenova/wavlm-base-plus-sd`.
js
import { AutoProcessor, AutoModelForAudioFrameClassification, read_audio } from 'xenova/transformers';
// Read and preprocess audio
const processor = await AutoProcessor.from_pretrained('Xenova/wavlm-base-plus-sd');
const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav';
const audio = await read_audio(url, 16000);
const inputs = await processor(audio);
// Run model with inputs
const model = await AutoModelForAudioFrameClassification.from_pretrained('Xenova/wavlm-base-plus-sd');
const { logits } = await model(inputs);
// {
// logits: Tensor {
// dims: [ 1, 549, 2 ], // [batch_size, num_frames, num_speakers]
// type: 'float32',
// data: Float32Array(1098) [-3.5301010608673096, ...],
// size: 1098
// }
// }
const labels = logits[0].sigmoid().tolist().map(
frames => frames.map(speaker => speaker > 0.5 ? 1 : 0)
);
console.log(labels); // labels is a one-hot array of shape (num_frames, num_speakers)
// [
// [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0],
// [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0],
// [0, 0], [0, 1], [0, 1], [0, 1], [0, 1], [0, 1],
// ...
// ]
These additions were made possible thanks to the following PRs:
* Add support for `WavLMForXVector` by D4ve-R in https://github.com/xenova/transformers.js/pull/603
* Add support for `WavLMForAudioFrameClassification` and `Wav2Vec2ForAudioFrameClassification` by D4ve-R in https://github.com/xenova/transformers.js/pull/611
* Add support for `UniSpeech` and `UniSpeechSat` models in https://github.com/xenova/transformers.js/pull/624
📝 Improved chat templating operation coverage
With this release, we're pleased to announce that Transformers.js is now able to parse every single valid chat template that is currently on the Hugging Face Hub! 🤯 As of 2024/03/05, this is around [~12k](https://huggingface.co/models?pipeline_tag=text-generation&other=conversational) conversational models (of which there were ~250 unique templates). Of course, future models may introduce more complex chat templates, and we'll continue to add support for them!
For example, transformers.js can now generate the prompt for highly complex function-calling models (e.g., [fireworks-ai/firefunction-v1](https://huggingface.co/fireworks-ai/firefunction-v1)):
<details>
<summary>See code</summary>
js
import { AutoTokenizer } from 'xenova/transformers';
const tokenizer = await AutoTokenizer.from_pretrained('fireworks-ai/firefunction-v1')
const function_spec = [
{
name: 'get_stock_price',
description: 'Get the current stock price',
parameters: {
type: 'object',
properties: {
symbol: {
type: 'string',
description: 'The stock symbol, e.g. AAPL, GOOG'
}
},
required: ['symbol']
}
},
{
name: 'check_word_anagram',
description: 'Check if two words are anagrams of each other',
parameters: {
type: 'object',
properties: {
word1: {
type: 'string',
description: 'The first word'
},
word2: {
type: 'string',
description: 'The second word'
}
},
required: ['word1', 'word2']
}
}
]
const messages = [
{ role: 'functions', content: JSON.stringify(function_spec, null, 4) },
{ role: 'system', content: 'You are a helpful assistant with access to functions. Use them if required.' },
{ role: 'user', content: 'Hi, can you tell me the current stock price of AAPL?' }
]
const inputs = tokenizer.apply_chat_template(messages, { tokenize: false });
console.log(inputs);
// <s>SYSTEM: You are a helpful assistant ...
</details>
🎨 New example applications and demos
* Create video object detection demo in https://github.com/xenova/transformers.js/pull/607 ([try it out](https://huggingface.co/spaces/Xenova/video-object-detection)).
![video-object-detection](https://github.com/xenova/transformers.js/assets/26504141/28735d45-bf46-4d51-b757-e45f3596813d)
* Create cross-encoder demo in https://github.com/xenova/transformers.js/pull/617 ([try it out](https://huggingface.co/spaces/Xenova/cross-encoder-web)).
![reranking-demo](https://github.com/xenova/transformers.js/assets/26504141/4c8d372b-584d-4d5e-b43c-9a03930ab712)
* Add Claude 3 and Mistral to the tokenizer playground in https://github.com/xenova/transformers.js/pull/625 ([try it out](https://huggingface.co/spaces/Xenova/the-tokenizer-playground)).
![claude3-tokenizer](https://github.com/xenova/transformers.js/assets/26504141/975ce1e9-da36-49cc-846c-cea0848b9f98)
🛠️ Misc. improvements
* Add support for the starcoder2 architecture in https://github.com/xenova/transformers.js/pull/622. _Note: we haven't yet added transformers.js-compatible versions of the 3B and 7B models._
* Check for existence of `onnx_env.wasm` before updating `wasmPaths` in https://github.com/xenova/transformers.js/pull/621
🤗 New contributors
* D4ve-R made their first contribution in https://github.com/xenova/transformers.js/pull/603
**Full Changelog**: https://github.com/xenova/transformers.js/compare/2.15.1...2.16.0