`Tube`
`Tube` is more flexible than `Tin` and slower in that it helps you create necessary fields and do automatic batching.
Registrations
All you need to do is to register:
* Input/intermediate/output **tensor shapes** instead of fields
* At least one kernel that takes the following as arguments
* Taichi fields: correspond to tensors (may or may not require gradients)
* (Optional) Extra arguments: will NOT receive gradients
Acceptable dimensions of tensors to be registered:
* `None`: means the flexible batch dimension, must be the first dimension e.g. `(None, 2, 3, 4)`
* Positive integers: fixed dimensions with the indicated dimensionality
* Negative integers:
* `-1`: means any number `[1, +inf)`, only usable in the registration of input tensors.
* Negative integers < -1: indices of some dimensions that must be of the same dimensionality
* Restriction: negative indices must be "declared" in the registration of input tensors first, then used in the registration of intermediate and output tensors.
* Example 1: tensor `a` and `b` of shapes `a: (2, -2, 3)` and `b: (-2, 5, 6)` mean the dimensions of `-2` must match.
* Example 2: tensor `a` and `b` of shapes `a: (-1, 2, 3)` and `b: (-1, 5, 6)` mean no restrictions on the first dimensions.
Registration order:
Input tensors/intermediate fields/output tensors must be registered first, and then kernel.
python
ti.kernel
def ti_add(arr_a: ti.template(), arr_b: ti.template(), output_arr: ti.template()):
for i in arr_a:
output_arr[i] = arr_a[i] + arr_b[i]
ti.init(ti.cpu)
cpu = torch.device("cpu")
a = torch.ones(10)
b = torch.ones(10)
tube = Tube(cpu) \
.register_input_tensor((10,), torch.float32, "arr_a", False) \
.register_input_tensor((10,), torch.float32, "arr_b", False) \
.register_output_tensor((10,), torch.float32, "output_arr", False) \
.register_kernel(ti_add, ["arr_a", "arr_b", "output_arr"]) \
.finish()
out = tube(a, b)
When registering a kernel, a list of field/tensor names is required, for example, the above `["arr_a", "arr_b", "output_arr"]`.
This list should correspond to the fields in the arguments of a kernel (e.g. above `ti_add()`).
The order of input tensors should match the input fields of a kernel.
Automatic batching
Automatic batching is done simply by running kernels `batch` times. The batch number is determined by the leading dimension of tensors of registered shape `(None, ...)`.
It's required that if any input tensors or intermediate fields are batched (which means they have registered the first dimension to be `None`), all output tensors must be registered as batched.
Examples
Simple one without negative indices or batch dimension:
python
ti.kernel
def ti_add(arr_a: ti.template(), arr_b: ti.template(), output_arr: ti.template()):
for i in arr_a:
output_arr[i] = arr_a[i] + arr_b[i]
ti.init(ti.cpu)
cpu = torch.device("cpu")
a = torch.ones(10)
b = torch.ones(10)
tube = Tube(cpu) \
.register_input_tensor((10,), torch.float32, "arr_a", False) \
.register_input_tensor((10,), torch.float32, "arr_b", False) \
.register_output_tensor((10,), torch.float32, "output_arr", False) \
.register_kernel(ti_add, ["arr_a", "arr_b", "output_arr"]) \
.finish()
out = tube(a, b)
With negative dimension index:
python
ti.init(ti.cpu)
cpu = torch.device("cpu")
tube = Tube(cpu) \
.register_input_tensor((-2,), torch.float32, "arr_a", False) \
.register_input_tensor((-2,), torch.float32, "arr_b", False) \
.register_output_tensor((-2,), torch.float32, "output_arr", False) \
.register_kernel(ti_add, ["arr_a", "arr_b", "output_arr"]) \
.finish()
dim = 10
a = torch.ones(dim)
b = torch.ones(dim)
out = tube(a, b)
assert torch.allclose(out, torch.full((dim,), 2.))
dim = 100
a = torch.ones(dim)
b = torch.ones(dim)
out = tube(a, b)
assert torch.allclose(out, torch.full((dim,), 2.))
With batch dimension:
python
ti.kernel
def int_add(a: ti.template(), b: ti.template(), out: ti.template()):
out[None] = a[None] + b[None]
ti.init(ti.cpu)
b = torch.tensor(1., requires_grad=True)
batched_a = torch.ones(10, requires_grad=True)
tube = Tube() \
.register_input_tensor((None,), torch.float32, "a") \
.register_input_tensor((), torch.float32, "b") \
.register_output_tensor((None,), torch.float32, "out", True) \
.register_kernel(int_add, ["a", "b", "out"]) \
.finish()
out = tube(batched_a, b)
loss = out.sum()
loss.backward()
assert torch.allclose(torch.ones_like(batched_a) + 1, out)
assert b.grad == 10.
assert torch.allclose(torch.ones_like(batched_a), batched_a.grad)
For more invalid use examples, please see tests in `tests/test_tube`.
Advanced field construction with `FieldManager`
There is a way to tweak how fields are constructed in order to gain performance improvement in kernel calculations.
By supplying a customized `FieldManager` when registering a field, you can construct a field however you want.
Please refer to the code `FieldManger` in `src/stannum/auxiliary.py` for more information.
If you don't know why constructing fields differently can improve performance, don't use this feature.
If you don't know how to construct fields differently, please refer to [Taichi field documentation](https://docs.taichi.graphics/lang/articles/advanced/layout).