Transformers-js-py

Latest version: v0.19.4

Safety actively analyzes 688053 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 13 of 14

0.9035086035728455

See [here](https://huggingface.co/models?library=transformers.js&other=dit&sort=trending) for the list of available models.

6. [SigLIP](https://huggingface.co/docs/transformers/main/en/model_doc/siglip) for zero-shot image classification. (https://github.com/xenova/transformers.js/pull/473)

js
import { pipeline } from 'xenova/transformers';

// Create a zero-shot image classification pipeline
const classifier = await pipeline('zero-shot-image-classification', 'Xenova/siglip-base-patch16-224');

// Classify images according to provided labels
const url = 'http://images.cocodataset.org/val2017/000000039769.jpg';
const output = await classifier(url, ['2 cats', '2 dogs'], {
hypothesis_template: 'a photo of {}',
});
// [
// { score: 0.16770583391189575, label: '2 cats' },
// { score: 0.000022096000975579955, label: '2 dogs' }
// ]


See [here](https://huggingface.co/models?library=transformers.js&other=dit&sort=trending) for the list of available models.


7. [RoFormer](https://huggingface.co/docs/transformers/main/en/model_doc/roformer) for masked language modelling, sequence classification, token classification, and question answering. (https://github.com/xenova/transformers.js/pull/464)

js
import { pipeline } from 'xenova/transformers';

// Create a masked language modelling pipeline
const pipe = await pipeline('fill-mask', 'Xenova/antiberta2');

// Predict missing token
const output = await pipe('Ḣ Q V Q ... C A [MASK] D ... T V S S');

<details>

<summary>See output</summary>

js
[
{
score: 0.48774364590644836,
token: 19,
token_str: 'R',
sequence: 'Ḣ Q V Q C A R D T V S S'
},
{
score: 0.2768442928791046,
token: 18,
token_str: 'Q',
sequence: 'Ḣ Q V Q C A Q D T V S S'
},
{
score: 0.0890476182103157,
token: 13,
token_str: 'K',
sequence: 'Ḣ Q V Q C A K D T V S S'
},
{
score: 0.05106702819466591,
token: 14,
token_str: 'L',
sequence: 'Ḣ Q V Q C A L D T V S S'
},
{
score: 0.021606773138046265,
token: 8,
token_str: 'E',
sequence: 'Ḣ Q V Q C A E D T V S S'
}
]


</details>

See [here](https://huggingface.co/models?library=transformers.js&other=roformer&sort=trending) for the list of available models.

🛠️ Misc. improvements

* Fix Next.js Dockerfile HOSTNAME by Lian1230 in https://github.com/xenova/transformers.js/pull/461
* Add spaces template link to README in https://github.com/xenova/transformers.js/pull/467

🤗 New Contributors
* Lian1230 made their first contribution in https://github.com/xenova/transformers.js/pull/461

**Full Changelog**: https://github.com/xenova/transformers.js/compare/2.12.1...2.13.0

0.8379436731338501

// ],
// size: 3
// }


You can then visualize the 3 predicted masks with:
js
const image = RawImage.fromTensor(masks[0][0].mul(255));
image.save('mask.png');


| Input image | Visualized output |
|--------|--------|
| ![corgi](https://github.com/xenova/transformers.js/assets/26504141/2b5f1f53-89e4-4398-8e29-2abf9e161c56) | ![mask](https://github.com/xenova/transformers.js/assets/26504141/2a75802a-9a34-4d3b-9a9e-9c72085e82e1) |



Next, select the channel with the highest IoU score, which in this case is the second (green) channel. Intersecting this with the original image gives us an isolated version of the subject:

| Selected Mask | Intersected |
|--------|--------|
| ![mask](https://github.com/xenova/transformers.js/assets/26504141/b66476dc-00c7-4f88-9687-3ab2f3fd3de3) | ![corgi-masked](https://github.com/xenova/transformers.js/assets/26504141/0275c051-ce8f-49f1-9b6a-197a901b3ac1) |


🛠️ Improvements
* Add support for processing non-square images w/ `ConvNextFeatureExtractor` in https://github.com/xenova/transformers.js/pull/503
* Encode revision in remote URL by https://github.com/xenova/transformers.js/pull/507

**Full Changelog**: https://github.com/xenova/transformers.js/compare/2.13.4...2.14.0

0.8350210189819336

0.8088238835334778

6. [ConvBERT](https://huggingface.co/docs/transformers/main/en/model_doc/convbert) for feature extraction (https://github.com/xenova/transformers.js/pull/445). See [here](https://huggingface.co/models?library=transformers.js&other=convbert&sort=trending) for the list of available models.

**Example:** Feature extraction w/ `Xenova/conv-bert-small`.

javascript
import { pipeline } from 'xenova/transformers';

// Create feature extraction pipeline
const extractor = await pipeline('feature-extraction', 'Xenova/conv-bert-small');

// Perform feature extraction
const output = await extractor('This is a test sentence.');
console.log(output)
// Tensor {
// dims: [ 1, 8, 256 ],
// type: 'float32',
// data: Float32Array(2048) [ -0.09434918314218521, 0.5715903043746948, ... ],
// size: 2048
// }


7. [ELECTRA](https://huggingface.co/docs/transformers/main/en/model_doc/electra) for feature extraction (https://github.com/xenova/transformers.js/pull/446). See [here](https://huggingface.co/models?library=transformers.js&other=electra&sort=trending) for the list of available models.

**Example:** Feature extraction w/ `Xenova/electra-small-discriminator`.

javascript
import { pipeline } from 'xenova/transformers';

// Create feature extraction pipeline
const extractor = await pipeline('feature-extraction', 'Xenova/electra-small-discriminator');

// Perform feature extraction
const output = await extractor('This is a test sentence.');
console.log(output)
// Tensor {
// dims: [ 1, 8, 256 ],
// type: 'float32',
// data: Float32Array(2048) [ 0.5410046577453613, 0.18386700749397278, ... ],
// size: 2048
// }


8. [Phi](https://huggingface.co/docs/transformers/main/en/model_doc/phi) for text generation (https://github.com/xenova/transformers.js/pull/443).

NOTE: This only adds support for the architecture. When the external data format is supported in ONNX Runtime, we will make an update that includes converted versions of the available Phi models.

🕹️ New example: Semantic Music Search application

In the last release, we added support for CLAP models (CLIP but for audio), so in this one, we're releasing a simple demo application which shows how you can use a CLAP model to perform real-time semantic music search! For simplicity, we implemented everything in vanilla JavaScript, but feel free to adapt it to your framework of choice! As always, the [source code is open source](https://github.com/xenova/transformers.js/tree/main/examples/semantic-audio-search)! 🥳 PR: https://github.com/xenova/transformers.js/pull/442

Demo video:

https://github.com/xenova/transformers.js/assets/26504141/72e09f8c-d6e9-4430-a56c-7994737966db


🐛 Bug fixes
* Fix tensor inheritance in https://github.com/xenova/transformers.js/pull/451. Thanks to devfacet for reporting the issue and to kungfooman for helping review the PR.

🛠️ Other features
* Add support for CLS pooling (feature extraction pipeline) in https://github.com/xenova/transformers.js/pull/450

📄 Documentation
* Add example usage for `SpeechT5ForSpeechToText` in https://github.com/xenova/transformers.js/pull/438

**Full Changelog**: https://github.com/xenova/transformers.js/compare/2.10.1...2.11.0

0.6453716158866882

<h3 id="llava_onevision">LLaVA-OneVision for Image-Text-to-Text</h3>

LLaVA-OneVision is a Vision-Language Model that can generate text conditioned on one or several images/videos. The model consists of SigLIP vision encoder and a Qwen2 language backbone.

**Example:** Multi-round conversations w/ PKV caching
js
import { AutoProcessor, AutoTokenizer, LlavaOnevisionForConditionalGeneration, RawImage } from 'huggingface/transformers';

// Load tokenizer, processor and model
const model_id = 'llava-hf/llava-onevision-qwen2-0.5b-ov-hf';

const tokenizer = await AutoTokenizer.from_pretrained(model_id);
const processor = await AutoProcessor.from_pretrained(model_id);
const model = await LlavaOnevisionForConditionalGeneration.from_pretrained(model_id, {
dtype: {
embed_tokens: 'fp16', // or 'fp32' or 'q8'
vision_encoder: 'fp16', // or 'fp32' or 'q8'
decoder_model_merged: 'q4', // or 'q8'
},
// device: 'webgpu',
});

// Prepare text inputs
const prompt = 'What does the text say?';
const messages = [
{ role: 'system', content: 'Answer the question.' },
{ role: 'user', content: `<image>\n${prompt}` }
]
const text = tokenizer.apply_chat_template(messages, { tokenize: false, add_generation_prompt: true });
const text_inputs = tokenizer(text);

// Prepare vision inputs
const url = 'https://huggingface.co/qnguyen3/nanoLLaVA/resolve/main/example_1.png';
const image = await RawImage.fromURL(url);
const vision_inputs = await processor(image);

// Generate response
const { past_key_values, sequences } = await model.generate({
...text_inputs,
...vision_inputs,
do_sample: false,
max_new_tokens: 64,
return_dict_in_generate: true,
});

// Decode output
const answer = tokenizer.decode(
sequences.slice(0, [text_inputs.input_ids.dims[1], null]),
{ skip_special_tokens: true },
);
console.log(answer);
// The text says "small but mighty" in a playful font.

const new_messages = [
...messages,
{ role: 'assistant', content: answer },
{ role: 'user', content: 'How does the text correlate to the context of the image?' }
]
const new_text = tokenizer.apply_chat_template(new_messages, { tokenize: false, add_generation_prompt: true });
const new_text_inputs = tokenizer(new_text);

// Generate another response
const output = await model.generate({
...new_text_inputs,
past_key_values,
do_sample: false,
max_new_tokens: 256,
});
const new_answer = tokenizer.decode(
output.slice(0, [new_text_inputs.input_ids.dims[1], null]),
{ skip_special_tokens: true },
);
console.log(new_answer);
// The text "small but mighty" is likely a playful or humorous reference to the image of the blue mouse with the orange dumbbell. It could be used as a motivational phrase or a playful way to express the idea that even small things can be impressive or powerful.


<h3 id="vitpose">ViTPose for pose-estimation</h3>

A state-of-the-art pose estimation model which employs a standard, non-hierarchical vision transformer as a backbone for the task of keypoint estimation (combined with a simple decoder head to predict heatmaps from a given image).

**Example:** Pose estimation w/ `onnx-community/vitpose-base-simple`.
js
import { AutoModel, AutoImageProcessor, RawImage } from 'huggingface/transformers';

// Load model and processor
const model_id = 'onnx-community/vitpose-base-simple';
const model = await AutoModel.from_pretrained(model_id);
const processor = await AutoImageProcessor.from_pretrained(model_id);

// Load image and prepare inputs
const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/ryan-gosling.jpg';
const image = await RawImage.read(url);
const inputs = await processor(image);

// Predict heatmaps
const { heatmaps } = await model(inputs);

// Post-process heatmaps to get keypoints and scores
const boxes = [[[0, 0, image.width, image.height]]];
const results = processor.post_process_pose_estimation(heatmaps, boxes)[0][0];
console.log(results);


<details>

<summary>Optionally, visualize the outputs (Node.js usage shown here, using the node-canvas library):</summary>

js
import { createCanvas, createImageData } from 'canvas';

// Create canvas and draw image
const canvas = createCanvas(image.width, image.height);
const ctx = canvas.getContext('2d');
const imageData = createImageData(image.rgba().data, image.width, image.height);
ctx.putImageData(imageData, 0, 0);

// Draw edges between keypoints
const points = results.keypoints;
ctx.lineWidth = 4;
ctx.strokeStyle = 'blue';
for (const [i, j] of model.config.edges) {
const [x1, y1] = points[i];
const [x2, y2] = points[j];
ctx.beginPath();
ctx.moveTo(x1, y1);
ctx.lineTo(x2, y2);
ctx.stroke();
}

// Draw circle at each keypoint
ctx.fillStyle = 'red';
for (const [x, y] of points) {
ctx.beginPath();
ctx.arc(x, y, 8, 0, 2 * Math.PI);
ctx.fill();
}

// Save image to file
import fs from 'fs';
const out = fs.createWriteStream('pose.png');
const stream = canvas.createPNGStream();
stream.pipe(out)
out.on('finish', () => console.log('The PNG file was created.'));


</details>

| Input image | Output image |
| :----------:|:------------:|
| ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/QpXlLNyLDKZUxXjokbUyy.jpeg) | ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/xj0jaKo9aAOux-NSU8U7S.png) |

<h3 id="mgp-str">MGP-STR for Optical Character Recognition (OCR)</h3>

A simple yet powerful vision scene text recognition model, built upon the vision transformer (ViT).

**Example:** Optical Character Recognition (OCR) w/ `onnx-community/mgp-str-base`

js
import { MgpstrForSceneTextRecognition, MgpstrProcessor, RawImage } from 'huggingface/transformers';

const model_id = 'onnx-community/mgp-str-base';
const model = await MgpstrForSceneTextRecognition.from_pretrained(model_id);
const processor = await MgpstrProcessor.from_pretrained(model_id);

// Load image from the IIIT-5k dataset
const url = "https://i.postimg.cc/ZKwLg2Gw/367-14.png";
const image = await RawImage.read(url);

// Preprocess the image
const result = await processor(image);

// Perform inference
const outputs = await model(result);

// Decode the model outputs
const generated_text = processor.batch_decode(outputs.logits).generated_text;
console.log(generated_text); // [ 'ticket' ]


<h3 id="patchtst-and-patchtsmixer">PatchTST and PatchTSMixer for time series forecasting.</h3>


**Example:** Time series forecasting w/ `onnx-community/granite-timeseries-patchtst`

Models which can be used for multivariate time series forecasting.

js
import { PatchTSTForPrediction, Tensor } from "huggingface/transformers";

const model_id = "onnx-community/granite-timeseries-patchtst";
const model = await PatchTSTForPrediction.from_pretrained(model_id, { dtype: "fp32" });

const dims = [64, 512, 7];
const prod = dims.reduce((a, b) => a * b, 1);
const past_values = new Tensor('float32',
Float32Array.from({ length: prod }, (_, i) => i / prod),
dims,
);
const { prediction_outputs } = await model({ past_values });
console.log(prediction_outputs);


**Example:** Time series forecasting w/ `onnx-community/granite-timeseries-patchtsmixer`

js
import { PatchTSMixerForPrediction, Tensor } from "huggingface/transformers";

const model_id = "onnx-community/granite-timeseries-patchtsmixer";
const model = await PatchTSMixerForPrediction.from_pretrained(model_id, { dtype: "fp32" });

const dims = [64, 512, 7];
const prod = dims.reduce((a, b) => a * b, 1);
const past_values = new Tensor('float32',
Float32Array.from({ length: prod }, (_, i) => i / prod),
dims,
);
const { prediction_outputs } = await model({ past_values });
console.log(prediction_outputs);


<h2 id="bug-fixes">🐛 Bug fixes</h2>

* When padding an image, the dimensions get stretched by BritishWerewolf in https://github.com/huggingface/transformers.js/pull/1015
* fix(scale): add missing scale element by tosinamuda in https://github.com/huggingface/transformers.js/pull/1017

<h2 id="documentation-improvements">📝 Documentation improvements</h2>

* Updated link to sentence similarity models. by uzyn in https://github.com/huggingface/transformers.js/pull/893
* fix(docs): fixed a broken link to quantization guide by ThomasWT in https://github.com/huggingface/transformers.js/pull/1014
* fix(docs): Fixed Typos in README and docs/snippets/6_supported-models.snippet by hitchhiker3010 in https://github.com/huggingface/transformers.js/pull/1030

<h2 id="other-improvements">🛠️ Other improvements</h2>

* Add option to maintain aspect ratio on resize by BritishWerewolf in https://github.com/huggingface/transformers.js/pull/971
* Add functionality to split RawImage into channels; Update slice documentation and tests by BritishWerewolf in https://github.com/huggingface/transformers.js/pull/978
* Avoid resizing images when they already have the desired size by nemphys in https://github.com/huggingface/transformers.js/pull/1027
* Add support for Split pretokenizer w/ `behavior=removed` & `invert=false` by xenova in https://github.com/huggingface/transformers.js/pull/1033
* Add type declaration for `progress_callback` by ocavue in https://github.com/huggingface/transformers.js/pull/1034
* Add support for op_block_list by pdufour in https://github.com/huggingface/transformers.js/pull/1036

<h2 id="new-contributors">🤗 New contributors</h2>

* uzyn made their first contribution in https://github.com/huggingface/transformers.js/pull/893
* ThomasWT made their first contribution in https://github.com/huggingface/transformers.js/pull/1014
* tosinamuda made their first contribution in https://github.com/huggingface/transformers.js/pull/1017
* nemphys made their first contribution in https://github.com/huggingface/transformers.js/pull/1027
* hitchhiker3010 made their first contribution in https://github.com/huggingface/transformers.js/pull/1030
* pdufour made their first contribution in https://github.com/huggingface/transformers.js/pull/1036

**Full Changelog**: https://github.com/huggingface/transformers.js/compare/3.0.2...3.1.0

0.6149784922599792

Page 13 of 14

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.