Tokeniser-py

Latest version: v0.1.2

The latest version of tokeniser-py with no known security vulnerabilities is 0.1.2. We recommend installing version 0.1.2.

The information on this page was curated by experts in our Cybersecurity Intelligence Team.

Latest release
v0.1.2 at March 22, 2025
License
MIT (MIT License)

Description

A custom tokeniser with a 131,072-token vocabulary derived from 0.5B (val) and 1B (val+test) tokens in SlimPajama. Uses a novel token generation algorithm and a dynamic programming-based segmentation method for fast, interpretable tokenisation, which can also be used for tokeniation on custom token maps.

Resources

Vulnerabilities

See all vulnerabilities

No known vulnerabilities found

Versions (3)

See all versions

Has known vulnerabilities

  • 0.1.2
  • 0.1.1
  • 0.1.0