Transformers-js-py

Latest version: v0.19.4

Safety actively analyzes 688053 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 14

2.16.0

What's new?

💬 StableLM text-generation models
This version adds support for the StableLM family of text-generation models (up to 1.6B params), developed by [Stability AI](https://huggingface.co/stabilityai). Huge thanks to D4ve-R for this contribution in https://github.com/xenova/transformers.js/pull/616! See [here](https://huggingface.co/models?library=transformers.js&other=stablelm) for the full list of supported models.

**Example:** Text generation with `Xenova/stablelm-2-zephyr-1_6b`.

js
import { pipeline } from 'xenova/transformers';

// Create text generation pipeline
const generator = await pipeline('text-generation', 'Xenova/stablelm-2-zephyr-1_6b');

// Define the prompt and list of messages
const prompt = "Tell me a funny joke."
const messages = [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": prompt },
]

// Apply chat template
const inputs = generator.tokenizer.apply_chat_template(messages, {
tokenize: false,
add_generation_prompt: true,
});

// Generate text
const output = await generator(inputs, { max_new_tokens: 20 });
console.log(output[0].generated_text);
// "<|system|>\nYou are a helpful assistant.\n<|user|>\nTell me a funny joke.\n<|assistant|>\nHere's a joke for you:\n\nWhy don't scientists trust atoms?\n\nBecause they make up everything!"


_Note: these models may be too large to run in your browser at the moment, so for now, we recommend using them in Node.js. Stay tuned for updates on this!_

🔉 Speaker verification and diarization models

**Example:** Speaker verification w/ `Xenova/wavlm-base-plus-sv`.

js
import { AutoProcessor, AutoModel, read_audio, cos_sim } from 'xenova/transformers';

// Load processor and model
const processor = await AutoProcessor.from_pretrained('Xenova/wavlm-base-plus-sv');
const model = await AutoModel.from_pretrained('Xenova/wavlm-base-plus-sv');

// Helper function to compute speaker embedding from audio URL
async function compute_embedding(url) {
const audio = await read_audio(url, 16000);
const inputs = await processor(audio);
const { embeddings } = await model(inputs);
return embeddings.data;
}

// Generate speaker embeddings
const BASE_URL = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/sv_speaker';
const speaker_1_1 = await compute_embedding(`${BASE_URL}-1_1.wav`);
const speaker_1_2 = await compute_embedding(`${BASE_URL}-1_2.wav`);
const speaker_2_1 = await compute_embedding(`${BASE_URL}-2_1.wav`);
const speaker_2_2 = await compute_embedding(`${BASE_URL}-2_2.wav`);

// Compute similarity scores
console.log(cos_sim(speaker_1_1, speaker_1_2)); // 0.959439158881247 (Both are speaker 1)
console.log(cos_sim(speaker_1_2, speaker_2_1)); // 0.618130172602329 (Different speakers)
console.log(cos_sim(speaker_2_1, speaker_2_2)); // 0.962999814169370 (Both are speaker 2)


**Example:** Perform speaker diarization with `Xenova/wavlm-base-plus-sd`.

js
import { AutoProcessor, AutoModelForAudioFrameClassification, read_audio } from 'xenova/transformers';

// Read and preprocess audio
const processor = await AutoProcessor.from_pretrained('Xenova/wavlm-base-plus-sd');
const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav';
const audio = await read_audio(url, 16000);
const inputs = await processor(audio);

// Run model with inputs
const model = await AutoModelForAudioFrameClassification.from_pretrained('Xenova/wavlm-base-plus-sd');
const { logits } = await model(inputs);
// {
// logits: Tensor {
// dims: [ 1, 549, 2 ], // [batch_size, num_frames, num_speakers]
// type: 'float32',
// data: Float32Array(1098) [-3.5301010608673096, ...],
// size: 1098
// }
// }

const labels = logits[0].sigmoid().tolist().map(
frames => frames.map(speaker => speaker > 0.5 ? 1 : 0)
);
console.log(labels); // labels is a one-hot array of shape (num_frames, num_speakers)
// [
// [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0],
// [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0],
// [0, 0], [0, 1], [0, 1], [0, 1], [0, 1], [0, 1],
// ...
// ]


These additions were made possible thanks to the following PRs:
* Add support for `WavLMForXVector` by D4ve-R in https://github.com/xenova/transformers.js/pull/603
* Add support for `WavLMForAudioFrameClassification` and `Wav2Vec2ForAudioFrameClassification` by D4ve-R in https://github.com/xenova/transformers.js/pull/611
* Add support for `UniSpeech` and `UniSpeechSat` models in https://github.com/xenova/transformers.js/pull/624

📝 Improved chat templating operation coverage

With this release, we're pleased to announce that Transformers.js is now able to parse every single valid chat template that is currently on the Hugging Face Hub! 🤯 As of 2024/03/05, this is around [~12k](https://huggingface.co/models?pipeline_tag=text-generation&other=conversational) conversational models (of which there were ~250 unique templates). Of course, future models may introduce more complex chat templates, and we'll continue to add support for them!

For example, transformers.js can now generate the prompt for highly complex function-calling models (e.g., [fireworks-ai/firefunction-v1](https://huggingface.co/fireworks-ai/firefunction-v1)):

<details>

<summary>See code</summary>

js
import { AutoTokenizer } from 'xenova/transformers';

const tokenizer = await AutoTokenizer.from_pretrained('fireworks-ai/firefunction-v1')

const function_spec = [
{
name: 'get_stock_price',
description: 'Get the current stock price',
parameters: {
type: 'object',
properties: {
symbol: {
type: 'string',
description: 'The stock symbol, e.g. AAPL, GOOG'
}
},
required: ['symbol']
}
},
{
name: 'check_word_anagram',
description: 'Check if two words are anagrams of each other',
parameters: {
type: 'object',
properties: {
word1: {
type: 'string',
description: 'The first word'
},
word2: {
type: 'string',
description: 'The second word'
}
},
required: ['word1', 'word2']
}
}
]

const messages = [
{ role: 'functions', content: JSON.stringify(function_spec, null, 4) },
{ role: 'system', content: 'You are a helpful assistant with access to functions. Use them if required.' },
{ role: 'user', content: 'Hi, can you tell me the current stock price of AAPL?' }
]

const inputs = tokenizer.apply_chat_template(messages, { tokenize: false });
console.log(inputs);
// <s>SYSTEM: You are a helpful assistant ...


</details>

🎨 New example applications and demos
* Create video object detection demo in https://github.com/xenova/transformers.js/pull/607 ([try it out](https://huggingface.co/spaces/Xenova/video-object-detection)).

![video-object-detection](https://github.com/xenova/transformers.js/assets/26504141/28735d45-bf46-4d51-b757-e45f3596813d)

* Create cross-encoder demo in https://github.com/xenova/transformers.js/pull/617 ([try it out](https://huggingface.co/spaces/Xenova/cross-encoder-web)).

![reranking-demo](https://github.com/xenova/transformers.js/assets/26504141/4c8d372b-584d-4d5e-b43c-9a03930ab712)

* Add Claude 3 and Mistral to the tokenizer playground in https://github.com/xenova/transformers.js/pull/625 ([try it out](https://huggingface.co/spaces/Xenova/the-tokenizer-playground)).

![claude3-tokenizer](https://github.com/xenova/transformers.js/assets/26504141/975ce1e9-da36-49cc-846c-cea0848b9f98)


🛠️ Misc. improvements
* Add support for the starcoder2 architecture in https://github.com/xenova/transformers.js/pull/622. _Note: we haven't yet added transformers.js-compatible versions of the 3B and 7B models._
* Check for existence of `onnx_env.wasm` before updating `wasmPaths` in https://github.com/xenova/transformers.js/pull/621

🤗 New contributors
* D4ve-R made their first contribution in https://github.com/xenova/transformers.js/pull/603

**Full Changelog**: https://github.com/xenova/transformers.js/compare/2.15.1...2.16.0

2.15.1

What's new?

* Add Background Removal demo in https://github.com/xenova/transformers.js/pull/576 ([online demo](https://huggingface.co/spaces/Xenova/remove-background-web)).

![background-removal](https://github.com/xenova/transformers.js/assets/26504141/4f288248-5ec9-4d7d-83a2-d15d80f9ebdc)

* Add support for owlv2 models in https://github.com/xenova/transformers.js/pull/579

**Example:** Zero-shot object detection w/ `Xenova/owlv2-base-patch16-ensemble`.
js
import { pipeline } from 'xenova/transformers';

const detector = await pipeline('zero-shot-object-detection', 'Xenova/owlv2-base-patch16-ensemble');

const url = 'http://images.cocodataset.org/val2017/000000039769.jpg';
const candidate_labels = ['a photo of a cat', 'a photo of a dog'];
const output = await detector(url, candidate_labels);
console.log(output);
// [
// { score: 0.7400985360145569, label: 'a photo of a cat', box: { xmin: 0, ymin: 50, xmax: 323, ymax: 485 } },
// { score: 0.6315087080001831, label: 'a photo of a cat', box: { xmin: 333, ymin: 23, xmax: 658, ymax: 378 } }
// ]


![image](https://github.com/xenova/transformers.js/assets/26504141/3259b7ff-f71f-4dad-be17-0de678382fc3)

* Add support for Adaptive Retrieval w/ Matryoshka Embeddings ([nomic-ai/nomic-embed-text-v1.5](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5)) in https://github.com/xenova/transformers.js/pull/587 and https://github.com/xenova/transformers.js/pull/588 ([online demo](https://huggingface.co/spaces/Xenova/adaptive-retrieval-web)).

![adaptive-retrieval](https://github.com/xenova/transformers.js/assets/26504141/70793198-b9ac-4373-a484-5274a823c238)

* Add support for Gemma Tokenizer in https://github.com/xenova/transformers.js/pull/597 and https://github.com/xenova/transformers.js/pull/598


**Full Changelog**: https://github.com/xenova/transformers.js/compare/2.15.0...2.15.1

2.15.0

What's new?

2.14.2

What's new?
* Add support for new Jina AI jina-embeddings-v2 models ([jinaai/jina-embeddings-v2-base-zh](https://huggingface.co/jinaai/jina-embeddings-v2-base-zh) and [jinaai/jina-embeddings-v2-base-de](https://huggingface.co/jinaai/jina-embeddings-v2-base-de)) in https://github.com/xenova/transformers.js/pull/542.
* Add support for [wav2vec2-bert](https://huggingface.co/docs/transformers/model_doc/wav2vec2-bert) in https://github.com/xenova/transformers.js/pull/544. See [here](https://huggingface.co/models?library=transformers.js&other=wav2vec2-bert&sort=trending) for the full list of supported models.
* Add zero-shot classification demo in https://github.com/xenova/transformers.js/pull/519 (see [online demo](https://huggingface.co/spaces/Xenova/zero-shot-classification-demo)):
![296304160-60170be3-287f-4451-9167-5ec850fb48c5](https://github.com/xenova/transformers.js/assets/26504141/99431406-d0ba-439f-9ecb-ceaeda6c592b)

**Full Changelog**: https://github.com/xenova/transformers.js/compare/2.14.1...2.14.2

2.14.1

What's new?
* Add support for Depth Anything (https://github.com/xenova/transformers.js/pull/534). See [here](https://huggingface.co/models?library=transformers.js&other=depth_anything) for the list of available models.

**Example:** Depth estimation with `Xenova/depth-anything-small-hf`.

js
import { pipeline } from 'xenova/transformers';

// Create depth-estimation pipeline
const depth_estimator = await pipeline('depth-estimation', 'Xenova/depth-anything-small-hf');

// Predict depth map for the given image
const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/bread_small.png';
const output = await depth_estimator(url);
// {
// predicted_depth: Tensor {
// dims: [350, 518],
// type: 'float32',
// data: Float32Array(181300) [...],
// size: 181300
// },
// depth: RawImage {
// data: Uint8Array(271360) [...],
// width: 640,
// height: 424,
// channels: 1
// }
// }


You can visualize the output with:

js
output.depth.save('depth.png');


| Input image | Visualized output |
|--------|--------|
| ![image](https://github.com/xenova/transformers.js/assets/26504141/9cb30c86-a5f8-4e4a-8876-c866edfa21f8) | ![image](https://github.com/xenova/transformers.js/assets/26504141/7c66b1cf-d8ed-49e5-bfef-f0a7cbf69ea8) |

Online demo: https://huggingface.co/spaces/Xenova/depth-anything-web

Example video:

https://github.com/xenova/transformers.js/assets/26504141/bbac3db6-8d8f-4386-a212-7e66ca616a0d


* Fix typo in tokenizers.js (https://github.com/xenova/transformers.js/pull/518)
* Return empty tokens array if text is empty after normalization (https://github.com/xenova/transformers.js/pull/535)

**Full Changelog**: https://github.com/xenova/transformers.js/compare/2.14.0...2.14.1

2.14.0

What's new?
🚀 Segment Anything Model (SAM)

The Segment Anything Model (SAM) can be used to generate segmentation masks for objects in a scene, given an input image and input points. See [here](https://huggingface.co/models?library=transformers.js&other=sam) for the full list of pre-converted models. Support for this model was added in https://github.com/xenova/transformers.js/pull/510.

![demo](https://github.com/xenova/transformers.js/assets/26504141/4a6475dc-d91f-4e69-b437-b30da66c0b65)

Demo + source code: https://huggingface.co/spaces/Xenova/segment-anything-web

**Example:** Perform mask generation w/ `Xenova/slimsam-77-uniform`.

js
import { SamModel, AutoProcessor, RawImage } from 'xenova/transformers';

const model = await SamModel.from_pretrained('Xenova/slimsam-77-uniform');
const processor = await AutoProcessor.from_pretrained('Xenova/slimsam-77-uniform');

const img_url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/corgi.jpg';
const raw_image = await RawImage.read(img_url);
const input_points = [[[340, 250]]] // 2D localization of a window

const inputs = await processor(raw_image, input_points);
const outputs = await model(inputs);

const masks = await processor.post_process_masks(outputs.pred_masks, inputs.original_sizes, inputs.reshaped_input_sizes);
console.log(masks);
// [
// Tensor {
// dims: [ 1, 3, 410, 614 ],
// type: 'bool',
// data: Uint8Array(755220) [ ... ],
// size: 755220
// }
// ]
const scores = outputs.iou_scores;
console.log(scores);
// Tensor {
// dims: [ 1, 1, 3 ],
// type: 'float32',
// data: Float32Array(3) [

Page 3 of 14

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.