DiffTaichi: Differentiable Programming for Physical Simulation
Yuanming Hu, Luke Anderson, Tzu-Mao Li, Qi Sun, Nathan Carr, Jonathan Ragan-Kelley, Frédo Durand
TL;DR
DiffTaichi presents a differentiable programming language built on Taichi to enable high-performance, end-to-end gradient computation for physical simulators. It combines megakernel fusion, imperative parallel programming, and flexible indexing with a two-scale AD approach: within-kernel differentiations via source-code transforms and end-to-end gradients using a lightweight tape. The framework demonstrates substantial productivity and speed gains across ten simulators, including elastic continua, incompressible fluids, and rigid bodies, often outperforming hand-tuned CUDA and mainstream autodiff frameworks. A key contribution is the robust handling of gradient computation in irregular, collision-based simulations through TOI-based techniques and complex-kernel overrides. Overall, DiffTaichi lowers the barrier to building differentiable physics engines suitable for neural-control loops and optimization tasks in robotics and related domains.
Abstract
We present DiffTaichi, a new differentiable programming language tailored for building high-performance differentiable physical simulators. Based on an imperative programming language, DiffTaichi generates gradients of simulation steps using source code transformations that preserve arithmetic intensity and parallelism. A light-weight tape is used to record the whole simulation program structure and replay the gradient kernels in a reversed order, for end-to-end backpropagation. We demonstrate the performance and productivity of our language in gradient-based learning and optimization tasks on 10 different physical simulators. For example, a differentiable elastic object simulator written in our language is 4.2x shorter than the hand-engineered CUDA version yet runs as fast, and is 188x faster than the TensorFlow implementation. Using our differentiable programs, neural network controllers are typically optimized within only tens of iterations.
