Notable changes:
- **Automatic differentiation**
- Fix floating-point type-cast gradients (687) (by **Yuanming Hu**)
- **CUDA backend**
- PyPI package `taichi-nightly` now covers CUDA 10.X on Windows and Linux (756) (by **Yuanming Hu**)
- **Examples**
- Add `game_of_life.py` (741) (by **彭于斌**)
- Fix `examples/regression.py` (757) (by **Quan Wang**)
- **GUI**
- Support SPACE key (749) (by **Ye Kuang**)
- Fix blinking particles and random segmentation faults in `ti.GUI.circles` (755) (by **Yuanming Hu**)
- **Language and syntax**
- Support `continue` on all backends (716) (by **Ye Kuang**)
- **LLVM backend (CPU and CUDA)**
- Fix LLVM struct-for codegen crashing due to extra return 704 (707) (by **Yuanming Hu**)
- **Metal backend**
- Support `ti.random()` on Metal (710) (by **Ye Kuang**)
- **OpenGL backend**
- Support NVIDIA GLSL compiler (666) (by **彭于斌**)
- 64-bit data type support (717) (by **彭于斌**)
- Support more than one external array arguments (694) (by **彭于斌**)
- **IR and Optimization**
- More Taichi IR standardization and optimization (656) (by **xumingkuan**)
Full changelog:
- [CUDA] limit memory allocation chunk size 128 MB (758) (by **Yuanming Hu**)
- [CUDA] PyPI package `taichi-nightly` now covers CUDA 10.X on Windows and Linux (756) (by **Yuanming Hu**)
- [Example] Fix `examples/regression.py` (757) (by **Quan Wang**)
- [ir] [refactor] Move all passes that do not change IR into `irpass::analysis` (754) (by **xumingkuan**)
- [GUI] Fix blinking particles and random segmentation faults in `ti.GUI.circles` (755) (by **Yuanming Hu**)
- [misc] Fix dynamic node out-of-bound checker (752) (by **Yuanming Hu**)
- [doc] Update `syntax.rst` to include `ti.sqrt`, `ti.asin`, `ti.acos` and `x ** y` (753) (by **Quan Wang**)
- [refactor] Remove vprintf in runtime/llvm/runtime.cpp to avoid name conflicts (750) (by **Yuanming Hu**)
- [misc] Update Jenkinsfile (by **Yuanming Hu**)
- [GUI][Metal] Support SPACE key (749) (by **Ye Kuang**)
- [OpenGL] 64-bit data type support (717) (by **彭于斌**)
- [cli] Better `ti test` to test single cpp file and no cpp test when testing python(s) (724) (by **彭于斌**)
- [ir] [refactor] Simplify statement visitors (744) (by **xumingkuan**)
- [async] Benchmark infrastructure (743) (by **Yuanming Hu**)
- [opt] Move common statements in true/false branches outside if's (727) (by **xumingkuan**)
- [opengl] Add `check_opengl_error` to prevent potential segfaults (728) (by **彭于斌**)
- [Example] Add `game_of_life.py` (741) (by **彭于斌**)
- [ir] [refactor] Add a `verify` pass to find out illegal IRs, and remove OffloadedStmt::begin_stmt/end_stmt (731) (by **xumingkuan**)
- [cli] Improve `ti` header message (715) (by **Ye Kuang**)
- [metal][refactor] Use `compile_to_offloads()` in codegen (738) (by **Ye Kuang**)
- [ir] Fix adjoint alloca location in `make_adjoint` (734) (by **Yuanming Hu**)
- [ir] Fix out-of-scope operands during offloading (730) (by **Yuanming Hu**)
- [metal] Refactor the codegen to have multiple code sections (733) (by **Ye Kuang**)
- [ir] [refactor] BasicStmtVisitor includes Frontend sstatements (732) (by **彭于斌**)
- [infra] AppVeyor triggers format server when `[format]` included as substrings in commit messages (725) (by **彭于斌**)
- [opengl] [refactor] Remove the global `no_gc` in `opengl_codegen.cpp` (723) (by **Ye Kuang**)
- [Lang] Support `continue` on all backends (716) (by **Ye Kuang**)
- [ir] Add assertions of `Alloca`s in the constructors of LocalAddress and LocalStoreStmt (by **xumingkuan**)
- [CUDA] Improve CUDA build portability with run-time driver loading (714) (by **Yuanming Hu**)
- [infra] Windows stack backtrace (720) (by **xumingkuan**)
- [ir][refactor] Use const & in function arguments to avoid copying (718) (by **xumingkuan**)
- [opengl] [refactor] Fix memory leakages using modern C++ memory management features (696) (by **彭于斌**)
- [ir] Fix compilation crash when passing global pointer to if statements (713) (by **Yuanming Hu**)
- [test] Make `test_struct_for_branching` run on archs with `pointer` (712) (by **Ye Kuang**)
- [Metal] Support `ti.random()` on Metal (710) (by **Ye Kuang**)
- [refactor] SNode now uses unique_ptr instead of shared_ptr for clearer children ownership (705) (by **Yuanming Hu**)
- [LLVM] Fix LLVM struct-for codegen crashing due to extra return 704 (707) (by **Yuanming Hu**)
- [metal] Skip `listgen` for leaf Snode (699) (by **Ye Kuang**)
- [refactor] CUDA-related infrastructure (706) (by **Yuanming Hu**)
- [OpenGL] Support more than one external array arguments (694) (by **彭于斌**)
- [ir] Add a function to test if two IRNodes are equivalent (683) (by **xumingkuan**)
- [refactor] Create `taichi/codegen/codegen_llvm.cpp` and outline class member definitions (702) (by **Taichi Gardener**)
- [refactor] Extract Taichi IR compilation from KernelCodeGen (700) (by **Yuanming Hu**)
- [async] AsyncEngine infrastructure (698) (by **Yuanming Hu**)
- [metal] Fix tests that require 64-bit data (697) (by **Ye Kuang**)
- [opengl] Improved randomness of PRNGs across each launch (692) (by **彭于斌**)
- [OpenGL] Support NVIDIA GLSL compiler (666) (by **彭于斌**)
- [metal] Fix bug in Metal listgen where it goes beyond the capacity (691) (by **Ye Kuang**)
- [ir] Add statement field manager (690) (by **Yuanming Hu**)
- [misc] Enforce the use of include "taichi/..." (688) (by **Taichi Gardener**)
- [AutoDiff] Fix floating-point type-cast gradients (687) (by **Yuanming Hu**)
- [metal] Use grid-stride loop to implement `listgen` kernels (682) (by **Ye Kuang**)
- [refactor] Remove `llvm::Value *Stmt::value` (686) (by **Yuanming Hu**)
- [refactor] Removed `Stmt::adjoint` (685) (by **Yuanming Hu**)
- [lang] Fix ti.static(ti.grouped(...)) syntax checker (681) (by **xumingkuan**)