Tinygrad

Latest version: v0.10.1

Safety actively analyzes 706267 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 2

0.10.1

LazyBuffers are gone!
At 10941 lines.

Release Highlights

- No LazyBuffer, just immutable UOp + Tensor
- New multi and gradient using graph_rewrite
- Many scheduler upgrades, try VIZ=1
- AM driver for a fully AMD free experience!
- llvmlite no longer a dependency
- DSP simulator

See the full changelog: https://github.com/tinygrad/tinygrad/compare/v0.10.0...v0.10.1

Join the [Discord](https://discord.gg/beYbxwxVdx)!

0.10.0

A significant under the hood update.
Over 1200 commits since `0.9.2`.
At 9937 lines.

Release Highlights

- `VIZ=1` to show how rewrites are happening, try it
- 0 python dependencies!
- Switch from numpy random to threefry, removing numpy [6116]
- Switch from pyobjc to ctypes for metal, removing pyobjc [6545]
- 3 new backends
- `QCOM=1` HCQ backend for runtime speed on Adreno 630 [5213]
- `CLOUD=1` for remote tinygrad [6964]
- `DSP=1` backend on Qualcomm devices (alpha) [6112]
- More Tensor Cores
- Apple AMX support [5693]
- Intel XMX tensor core support [5622]
- Core refactors
- Removal of symbolic, it's just UOp rewrite now
- Many refactors with EXPAND, VECTORIZE, and INDEX
- Progress toward the replacement of `LazyBuffer` with `UOp`

See the full changelog: https://github.com/tinygrad/tinygrad/compare/v0.9.2...v0.10.0

See the known issues: https://github.com/tinygrad/tinygrad/issues?q=is%3Aissue+is%3Aopen+label%3Abug+sort%3Aupdated-desc

Join the [Discord](https://discord.gg/beYbxwxVdx)!

0.9.2

Small changes.
Over 700 commits since `0.9.1`.

Release Highlights

- experimental [Monte Carlo Tree Search](https://en.wikipedia.org/wiki/Monte_Carlo_tree_search) when `BEAM>=100` [#5598]
- `TRANSCENDENTAL>=2` or by default on `CLANG` and `LLVM` to provide `sin`, `log2`, and `exp2` approximations. [5187]
- when running with `DEBUG>=2` you now see the tensor ops that are part of a kernel [5271]
![image](https://github.com/user-attachments/assets/0c583295-d66d-4475-a86d-310a9a246aa3)
- `PROFILE=1` for a profiler when using HCQ backends (`AMD`, `NV`)
![image](https://github.com/user-attachments/assets/a7b07962-31b7-4154-9ba1-3b0320b1e1e5)
- Refactor `Linearizer` to `Lowerer` [4957]

See the full changelog: https://github.com/tinygrad/tinygrad/compare/v0.9.1...v0.9.2

See the known issues: https://github.com/tinygrad/tinygrad/issues?q=is%3Aissue+is%3Aopen+label%3Abug+sort%3Aupdated-desc

Join the [Discord](https://discord.gg/beYbxwxVdx)!

0.9.1

Now sitting at 7844 lines, less than last release.
Looking to tag releases more often.

Over 320 commits since `0.9.0`.

Release Highlights

- Removal of the HSA backend, defaulting to AMD. [4885]
- [tinychat](https://github.com/tinygrad/tinygrad/tree/master/examples/tinychat), a pretty simple llm web ui. [#4869]
- [SDXL example](https://github.com/tinygrad/tinygrad/blob/master/examples/sdxl.py). [#5206]
- A small tqdm [replacement](https://github.com/tinygrad/tinygrad/blob/7f46bfa58780dced48bba30f9e003a2561b48a50/tinygrad/helpers.py#L280). [4846]
- NV/AMD profiler using [perfetto](https://ui.perfetto.dev/). [#4718]

Known Issues

- Using tinygrad in a conda env on macOS is known to cause problems with the `METAL` backend. See 2226.

See the full changelog: https://github.com/tinygrad/tinygrad/compare/v0.9.0...v0.9.1

See the known issues: https://github.com/tinygrad/tinygrad/issues?q=is%3Aissue+is%3Aopen+label%3Abug+sort%3Aupdated-desc

Join the [Discord](https://discord.gg/beYbxwxVdx)!

0.9.0

Close to the new line limit of 8000 lines, sitting at 7958 lines.
tinygrad is ***much*** more usable now.

Just over 1200 commits since `0.8.0`.

Release Highlights

- New documentation: [https://docs.tinygrad.org](https://docs.tinygrad.org)
- `gpuctypes` has been brought in tree and is no longer an external dependency. [3253]
- `AMD=1` and `NV=1` experimental backends for not requiring any userspace runtime components like ROCm or CUDA.
- These backends should reduce the amount of python time, and specifically with multi-gpu use cases.
- `PTX=1` for rendering directly to ptx instead of cuda. [3139] [3623] [3775]
- Nvidia tensor core support. [3544]
- `THREEFRY=1` for numpy-less random number generation using threefry2x32. [2601] [3785]
- More stabilized [multi-tensor API](/tinygrad/multi.py).
- With ring all-reduce: [3000] [3852]
- Core tinygrad has been refactored into 4 pieces, read more about it [here](https://docs.tinygrad.org/developer/).
- Linearizer and codegen has support for generating kernels with multiple outputs.
- Lots of progress towards greater kernel fusion in the scheduler.
- Fusing of ReduceOps with their elementwise children. This trains mnist and gpt2 with ~20% less kernels and makes llama inference faster.
- New LoadOps.ASSIGN allows fusing optimizer updates with grad.
- Schedule kernels in BFS order. This improves resnet and llama speed.
- W.I.P. for fusing multiple reduces: [4259] [4208]
- MLPerf [ResNet](https://github.com/tinygrad/tinygrad/blob/0b58203cbe9ac67de3ae598c8e6552c2935fcb1e/examples/mlperf/model_train.py#L14) and [BERT](https://github.com/tinygrad/tinygrad/blob/0b58203cbe9ac67de3ae598c8e6552c2935fcb1e/examples/mlperf/model_train.py#L392) with a W.I.P. [UNet3D](https://github.com/tinygrad/tinygrad/pull/3470)
- Llama 3 support with a new [`llama3.py`](/examples/llama3.py) that provides an OpenAI compatible API. [4576]
- [NF4](https://arxiv.org/pdf/2305.14314) quantization support in Llama examples. [#4540]
- `label_smoothing` has been added to `sparse_categorical_crossentropy`. [3568]

Known Issues

- Using tinygrad in a conda env on macOS is known to cause problems with the `METAL` backend. See 2226.

See the full changelog: https://github.com/tinygrad/tinygrad/compare/v0.8.0...v0.9.0

See the known issues: https://github.com/tinygrad/tinygrad/issues?q=is%3Aissue+is%3Aopen+label%3Abug+sort%3Aupdated-desc

Join the [Discord](https://discord.gg/beYbxwxVdx)!

0.8.0

Close to the new limit of 5000 lines at 4981.

Release Highlights

- Real dtype support within kernels!
- New `.schedule()` API to separate concerns of scheduling and running
- New lazy.py implementation doesn't reorder at build time. `GRAPH=1` is usable to debug issues
- 95 TFLOP FP16->FP32 matmuls on 7900XTX
- GPT2 runs (jitted) in 2 ms on NVIDIA 3090
- Powerful and fast kernel beam search with `BEAM=2`
- GPU/CUDA/HIP backends switched to `gpuctypes`
- New (alpha) multigpu sharding API with `.shard`

See the full changelog: https://github.com/tinygrad/tinygrad/compare/v0.7.0...v0.8.0

Join the [Discord](https://discord.gg/beYbxwxVdx)!

Page 1 of 2

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.