Packed Mode
Previously in Taichi, all non-power-of-two dimensions of a field were automatically padded to a power of two. For instance, a field of shape `(18, 65)` would have internal shape `(32, 128)`. Although the padding had many benefits such as allowing fast and convenient bitwise operations for coordinate handling, it would consume potentially much more memory than people thought.
For people indeed want smaller memory usage, we now introduce an optional packed mode. In packed mode, no more padding will be applied so a field will not have a larger internal shape when some of its dimensions are not power-of-two. The downside is that the runtime performance will regress slightly.
A switch named `packed` for `ti.init()` decides whether to use packed mode:
python
ti.init() default: packed=False
a = ti.field(ti.i32, shape=(18, 65)) padded to (32, 128)
python
ti.init(packed=True)
a = ti.field(ti.i32, shape=(18, 65)) no padding
GGUI
A new GUI system, which is codenamed GGUI, is added to Taichi. GGUI will use GPUs for rendering, which enables it to be much faster than the original `ti.gui`, and to render 3d meshes and particles. It also comes with a brand new set of immediate mode widgets APIs.
Sample 3D code:
python
window = ti.ui.Window("Hello Taichi", (1920, 1080))
canvas = window.get_canvas()
scene = ti.ui.Scene()
camera = ti.ui.make_camera()
while window.running:
camera.position(...)
camera.lookat(...)
scene.set_camera(camera)
scene.point_light(pos=(...), color=(...))
vertices, centers, etc. are taichi fields
scene.mesh(vertices, ...)
scene.particles(centers, radius, ...)
canvas.scene(scene)
window.show()
Sample IMGUI code:
python
window = ti.ui.Window("Hello Taichi", (500, 500))
canvas = window.get_canvas()
gx, gy, gz = (0, -9.8, 0)
while window.running:
window.GUI.begin("Greetings", 0.1, 0.1, 0.8, 0.15)
window.GUI.text("Welcome to TaichiCon !")
if window.GUI.button("Bye"):
window.running = False
window.GUI.end()
window.GUI.begin("Gravity", 0.1, 0.3, 0.8, 0.3)
gx = window.GUI.slider_float("x", gx, -10, 10)
gy = window.GUI.slider_float("y", gy, -10, 10)
gz = window.GUI.slider_float("z", gz, -10, 10)
window.GUI.end()
canvas.set_background_color(color)
window.show()
For more examples, please checkout `examples/ggui_examples` in the taichi repo.
Dynamic SNode Allocation
Previously in Taichi, we cannot allocate new fields after the kernel's execution. Now we can use a new class `FieldsBuilder` to support dynamic allocation.
`FieldsBuilder` has the same data structure declaration API as the previous `root`, such as `dense()`, `pointer()` etc. After declaration, we need to call the `finalize()` function to compile the `FieldsBuilder` to an `SNodeTree` object.
Example usage for `FieldsBuilder`:
py
import taichi as ti
ti.init()
ti.kernel
def func(v: ti.template()):
for I in ti.grouped(v):
v[I] += 1
fb = ti.FieldsBuilder()
x = ti.field(dtype = ti.f32)
fb.dense(ti.ij, (5, 5)).place(x)
fb_snode_tree = fb.finalize() Finalizing the FieldsBuilder and returns a SNodeTree
func(x)
fb2 = ti.FieldsBuilder()
y = ti.field(dtype = ti.f32)
fb2.dense(ti.i, 5).place(y)
fb2_snode_tree = fb2.finalize() Finalizing the FieldsBuilder and returns a SNodeTree
func(y)
Additionally, `root` now is implemented by `FieldsBuilder` implicitly, so we can allocate the fields directly under `root`.
py
import taichi as ti
ti.init() ti.root = ti.FieldsBuilder()
ti.kernel
def func(v: ti.template()):
for I in ti.grouped(v):
v[I] += 1
x = ti.field(dtype = ti.f32)
ti.root.dense(ti.ij, (5, 5)).place(x)
func(x) automatically called ti.root.finalize()
ti.root = new ti.FieldsBuilder()
y = ti.field(dtype = ti.f32)
ti.root.dense(ti.i, 5).place(y)
func(y) automatically called ti.root.finalize()
Furthermore, after we called the `finalize()` of a `FieldsBuilder`, it will return a finalized `SNodeTree` object. If we do not want to use the fields under this `SNodeTree`, we could call `destroy()` manually to recycle the memory into the memory pool.
e.g.:
py
import taichi as ti
ti.init()
ti.kernel
def func(v: ti.template()):
for I in ti.grouped(v):
v[I] += 1
fb = ti.FieldsBuilder()
x = ti.field(dtype = ti.f32)
fb.dense(ti.ij, (5, 5)).place(x)
fb_snode_tree = fb.finalize() Finalizing the FieldsBuilder and returns a SNodeTree
func(x)
fb_snode_tree.destroy()
func(x) cannot be used anymore
Full changelog:
- [doc] Fix several typos in doc (2972) (by **Ziyi Wu**)
- [opengl] Runtime refactor 1/n (2965) (by **Bob Cao**)
- [refactor] Avoid passing device strings into torch (2968) (by **Yi Xu**)
- [misc] Fix typos in examples/simulation/fractal.py (2882) (by **Yilong Li**)
- [opt] Support atomic min/max in warp reduction optimization (2956) (by **Yi Xu**)
- [Bug] Add GIL that was accidentally removed in PR 2939 back (2964) (by **lin-hitonami**)
- [misc] Support clean command to setup.py. (by **Ailing Zhang**)
- [misc] Fix some build warnings. (by **Ailing Zhang**)
- [doc] Add docstring for GGUI python API (2958) (by **Dunfan Lu**)
- [gui] Move all ggui kernels to python by using taichi fields as staging buffers (2957) (by **Dunfan Lu**)
- [opt] Add conservative alias analysis for ExternalPtrStmt (2952) (by **Yi Xu**)
- [opengl] Move old runtime onto Device API (2945) (by **Bob Cao**)
- [Lang] Remove deprecated usage of ti.Matrix.__init__ (2950) (by **Yi Xu**)
- [Lang] Add data_handle property to Ndarray (2947) (by **Yi Xu**)
- [misc] Throw proper error if real function is not properly annotated. (2943) (by **Ailing**)
- [gui] Fix normal bug when default fp is not f32. (2944) (by **Dunfan Lu**)
- [opengl] Device API: Adding GL error checks & correct memory mapping flags (2941) (by **Bob Cao**)
- [Lang] Support configure sparse solver ordering (2907) (by **FantasyVR**)
- [refactor] remove Program::KernelProxy (2939) (by **lin-hitonami**)
- [doc] Update README.md (2940) (by **Yuanming Hu**)
- [opengl] Initial Device API work (2925) (by **Bob Cao**)
- [Lang] Support ti_print for wasm (2910) (by **squarefk**)
- [Lang] Fix ti func with template and add corresponding tests (2871) (by **squarefk**)
- [doc] Update README.md (2937) (by **Yuanming Hu**)
- [metal] Fix metal codegen to make OSX 10.14 work (2935) (by **Ye Kuang**)
- [Doc] Add developer installation to README.md (2933) (by **Ye Kuang**)
- [misc] Edit preset indices (2932) (by **ljcc0930**)
- fratal example (2931) (by **Dunfan Lu**)
- [refactor] Exchange compiled_grad_functions and compiled_functions in kernel_impl.py (2930) (by **Yi Xu**)
- [Misc] Update doc links (2928) (by **FantasyVR**)
- Disable a few vulkan flaky tests. (2926) (by **Ailing**)
- [llvm] Remove duplicated set dim attribute for GlobalVariableExpression (2929) (by **Ailing**)
- [ci] Artifact uploading before test in release.yml (2921) (by **Jiasheng Zhang**)
- [bug] Fix the Bug that cannot assign a value to a scalar member in a struct from python scope (2894) (by **JeffreyXiang**)
- [misc] Update examples (2924) (by **Taichi Gardener**)
- [ci] Enable tmate session if release test fails. (2919) (by **Ailing**)
- [refactor] [CUDA] Wrap the default profiling tool as EventToolkit , add a new class for CUPTI toolkit (2916) (by **rocket**)
- [metal] Fix upperbound for list-gen and struct-for (2915) (by **Ye Kuang**)
- [ci] Fix linux release forgot to remove old taichi (2914) (by **Jiasheng Zhang**)
- [Doc] Add docstring for indices() and axes() (2917) (by **Ye Kuang**)
- [refactor] Rename SNode::n to SNode::num_cells_per_container (2911) (by **Ye Kuang**)
- Enable deploy preview if changes are detected in docs. (2913) (by **Ailing**)
- [refactor] [CUDA] Add traced_records_ for KernelProfilerBase, refactoring KernelProfilerCUDA::sync() (2909) (by **rocket**)
- [ci] Moved linux release to github action (2905) (by **Jiasheng Zhang**)
- [refactor] [CUDA] Move KernelProfilerCUDA from program/kernel_profiler.cpp to backends/cuda/cuda_profiler.cpp (2902) (by **rocket**)
- [wasm] Fix WASM AOT module builder order (2904) (by **Ye Kuang**)
- [CUDA] Add a compilation option for CUDA toolkit (2899) (by **rocket**)
- [vulkan] Support for multiple SNode trees in Vulkan (2903) (by **Dunfan Lu**)
- add destory snode tree api (2898) (by **Dunfan Lu**)