Tinygrad

Latest version: v0.10.0

Safety actively analyzes 681775 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 2

0.10.0

A significant under the hood update.
Over 1200 commits since `0.9.2`.
At 9937 lines.

Release Highlights

- `VIZ=1` to show how rewrites are happening, try it
- 0 python dependencies!
- Switch from numpy random to threefry, removing numpy [6116]
- Switch from pyobjc to ctypes for metal, removing pyobjc [6545]
- 3 new backends
- `QCOM=1` HCQ backend for runtime speed on Adreno 630 [5213]
- `CLOUD=1` for remote tinygrad [6964]
- `DSP=1` backend on Qualcomm devices (alpha) [6112]
- More Tensor Cores
- Apple AMX support [5693]
- Intel XMX tensor core support [5622]
- Core refactors
- Removal of symbolic, it's just UOp rewrite now
- Many refactors with EXPAND, VECTORIZE, and INDEX
- Progress toward the replacement of `LazyBuffer` with `UOp`

See the full changelog: https://github.com/tinygrad/tinygrad/compare/v0.9.2...v0.10.0

See the known issues: https://github.com/tinygrad/tinygrad/issues?q=is%3Aissue+is%3Aopen+label%3Abug+sort%3Aupdated-desc

Join the [Discord](https://discord.gg/beYbxwxVdx)!

0.9.2

Small changes.
Over 700 commits since `0.9.1`.

Release Highlights

- experimental [Monte Carlo Tree Search](https://en.wikipedia.org/wiki/Monte_Carlo_tree_search) when `BEAM>=100` [#5598]
- `TRANSCENDENTAL>=2` or by default on `CLANG` and `LLVM` to provide `sin`, `log2`, and `exp2` approximations. [5187]
- when running with `DEBUG>=2` you now see the tensor ops that are part of a kernel [5271]
![image](https://github.com/user-attachments/assets/0c583295-d66d-4475-a86d-310a9a246aa3)
- `PROFILE=1` for a profiler when using HCQ backends (`AMD`, `NV`)
![image](https://github.com/user-attachments/assets/a7b07962-31b7-4154-9ba1-3b0320b1e1e5)
- Refactor `Linearizer` to `Lowerer` [4957]

See the full changelog: https://github.com/tinygrad/tinygrad/compare/v0.9.1...v0.9.2

See the known issues: https://github.com/tinygrad/tinygrad/issues?q=is%3Aissue+is%3Aopen+label%3Abug+sort%3Aupdated-desc

Join the [Discord](https://discord.gg/beYbxwxVdx)!

0.9.1

Now sitting at 7844 lines, less than last release.
Looking to tag releases more often.

Over 320 commits since `0.9.0`.

Release Highlights

- Removal of the HSA backend, defaulting to AMD. [4885]
- [tinychat](https://github.com/tinygrad/tinygrad/tree/master/examples/tinychat), a pretty simple llm web ui. [#4869]
- [SDXL example](https://github.com/tinygrad/tinygrad/blob/master/examples/sdxl.py). [#5206]
- A small tqdm [replacement](https://github.com/tinygrad/tinygrad/blob/7f46bfa58780dced48bba30f9e003a2561b48a50/tinygrad/helpers.py#L280). [4846]
- NV/AMD profiler using [perfetto](https://ui.perfetto.dev/). [#4718]

Known Issues

- Using tinygrad in a conda env on macOS is known to cause problems with the `METAL` backend. See 2226.

See the full changelog: https://github.com/tinygrad/tinygrad/compare/v0.9.0...v0.9.1

See the known issues: https://github.com/tinygrad/tinygrad/issues?q=is%3Aissue+is%3Aopen+label%3Abug+sort%3Aupdated-desc

Join the [Discord](https://discord.gg/beYbxwxVdx)!

0.9.0

Close to the new line limit of 8000 lines, sitting at 7958 lines.
tinygrad is ***much*** more usable now.

Just over 1200 commits since `0.8.0`.

Release Highlights

- New documentation: [https://docs.tinygrad.org](https://docs.tinygrad.org)
- `gpuctypes` has been brought in tree and is no longer an external dependency. [3253]
- `AMD=1` and `NV=1` experimental backends for not requiring any userspace runtime components like ROCm or CUDA.
- These backends should reduce the amount of python time, and specifically with multi-gpu use cases.
- `PTX=1` for rendering directly to ptx instead of cuda. [3139] [3623] [3775]
- Nvidia tensor core support. [3544]
- `THREEFRY=1` for numpy-less random number generation using threefry2x32. [2601] [3785]
- More stabilized [multi-tensor API](/tinygrad/multi.py).
- With ring all-reduce: [3000] [3852]
- Core tinygrad has been refactored into 4 pieces, read more about it [here](https://docs.tinygrad.org/developer/).
- Linearizer and codegen has support for generating kernels with multiple outputs.
- Lots of progress towards greater kernel fusion in the scheduler.
- Fusing of ReduceOps with their elementwise children. This trains mnist and gpt2 with ~20% less kernels and makes llama inference faster.
- New LoadOps.ASSIGN allows fusing optimizer updates with grad.
- Schedule kernels in BFS order. This improves resnet and llama speed.
- W.I.P. for fusing multiple reduces: [4259] [4208]
- MLPerf [ResNet](https://github.com/tinygrad/tinygrad/blob/0b58203cbe9ac67de3ae598c8e6552c2935fcb1e/examples/mlperf/model_train.py#L14) and [BERT](https://github.com/tinygrad/tinygrad/blob/0b58203cbe9ac67de3ae598c8e6552c2935fcb1e/examples/mlperf/model_train.py#L392) with a W.I.P. [UNet3D](https://github.com/tinygrad/tinygrad/pull/3470)
- Llama 3 support with a new [`llama3.py`](/examples/llama3.py) that provides an OpenAI compatible API. [4576]
- [NF4](https://arxiv.org/pdf/2305.14314) quantization support in Llama examples. [#4540]
- `label_smoothing` has been added to `sparse_categorical_crossentropy`. [3568]

Known Issues

- Using tinygrad in a conda env on macOS is known to cause problems with the `METAL` backend. See 2226.

See the full changelog: https://github.com/tinygrad/tinygrad/compare/v0.8.0...v0.9.0

See the known issues: https://github.com/tinygrad/tinygrad/issues?q=is%3Aissue+is%3Aopen+label%3Abug+sort%3Aupdated-desc

Join the [Discord](https://discord.gg/beYbxwxVdx)!

0.8.0

Close to the new limit of 5000 lines at 4981.

Release Highlights

- Real dtype support within kernels!
- New `.schedule()` API to separate concerns of scheduling and running
- New lazy.py implementation doesn't reorder at build time. `GRAPH=1` is usable to debug issues
- 95 TFLOP FP16->FP32 matmuls on 7900XTX
- GPT2 runs (jitted) in 2 ms on NVIDIA 3090
- Powerful and fast kernel beam search with `BEAM=2`
- GPU/CUDA/HIP backends switched to `gpuctypes`
- New (alpha) multigpu sharding API with `.shard`

See the full changelog: https://github.com/tinygrad/tinygrad/compare/v0.7.0...v0.8.0

Join the [Discord](https://discord.gg/beYbxwxVdx)!

0.7.0

Bigger again at 4311 lines :( But, tons of new features this time!

Just over 500 commits since `0.6.0`.

Release Highlights

- Windows support has been dropped to focus on Linux and Mac OS.
- Some functionality may work on Windows but no support will be provided, use WSL instead.
- [DiskTensors](/tinygrad/runtime/ops_disk.py): a way to store tensors on disk has been added.
- This is coupled with functionality in [`state.py`](/tinygrad/nn/state.py) which supports saving/loading safetensors and loading torch weights.
- Tensor Cores are supported on M1/Apple Silicon and on the 7900 XTX (WMMA).
- Support on the 7900 XTX requires weights and data to be in float16, full float16 compute support will come in a later release.
- Tensor Core behaviour/usage is controlled by the `TC` envvar.
- Kernel optimization with nevergrad
- This optimizes the shapes going into the kernel, gated by the `KOPT` envvar.
- P2P buffer transfers are supported on *most* AMD GPUs when using a single python process.
- This is controlled by the `P2P` envvar.
- LLaMA 2 support.
- A requirement of this is bfloat16 support for loading the weights, which is semi-supported by casting them to float16, proper bfloat16 support is tracked at 1290.
- The LLaMA example now also supports 8-bit quantization using the flag `--quantize`.
- Most MLPerf models have working inference examples. Training these models is currently being worked on.
- Initial multigpu training support.
- *slow* multigpu training by copying through host shared memory.
- Somewhat follows torch's multiprocessing and DistributedDataParallel high-level design.
- See the [hlb_cifar10.py](/examples/hlb_cifar10.py) example.
- SymbolicShapeTracker and Symbolic JIT.
- These two things combined allow models with changing shapes to be jitted like transformers.
- This means that LLaMA can now be jitted for a massive increase in performance.
- Be warned that the API for this is very WIP and may change in the future, similarly with the rest of the tinygrad API.
- [aarch64](/tinygrad/renderer/assembly_arm64.py) and [ptx](/tinygrad/renderer/assembly_ptx.py) assembly backend.
- WebGPU backend, see the [`compile_efficientnet.py`](/examples/compile_efficientnet.py) example.
- Support for torch like tensor indexing by other tensors.
- Some more `nn` layers were promoted, namely `Embedding` and various `Conv` layers.
- [VITS](/examples/vits.py) and [so-vits-svc](/examples/so_vits_svc.py) examples added.
- Initial documentation work.
- Quickstart guide: [`/docs/quickstart.md`](/docs/quickstart.md)
- Environment variable reference: [`/docs/env_vars.md`](/docs/env_vars.md)

And lots of small optimizations all over the codebase.

See the full changelog: https://github.com/tinygrad/tinygrad/compare/v0.6.0...v0.7.0

See the known issues: https://github.com/tinygrad/tinygrad/issues?q=is%3Aissue+is%3Aopen+label%3Abug+sort%3Aupdated-desc

Join the [Discord](https://discord.gg/beYbxwxVdx)!

Page 1 of 2

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.