Table of Contents
Fetching ...

KITINet: Kinetics Theory Inspired Network Architectures with PDE Simulation Approaches

Mingquan Feng, Yifan Fu, Tongcheng Zhang, Yu Jiang, Yixin Huang, Junchi Yan

TL;DR

KITINet reframes residual learning as kinetic particle dynamics governed by the Boltzmann transport equation and simulated via a Direct Simulation Monte Carlo–style collision mechanism. It introduces a collision-based residual module with divisional particle representations, demonstrating that training induces network parameter condensation and yields improvements across PDE operator learning, image classification, and NLP tasks with negligible FLOP overhead. Empirical results show consistent gains over ResNet and Transformer baselines, including notable improvements when embedded with FNO and OFormer for PDE solving and strong performance on CIFAR and BERT-based NLP tasks. The work offers a principled, physics-informed approach to architecture design, linking non-equilibrium dynamics to sparsity and generalization, while also outlining limitations and future scaling opportunities to larger benchmarks.

Abstract

Despite the widely recognized success of residual connections in modern neural networks, their design principles remain largely heuristic. This paper introduces KITINet (Kinetics Theory Inspired Network), a novel architecture that reinterprets feature propagation through the lens of non-equilibrium particle dynamics and partial differential equation (PDE) simulation. At its core, we propose a residual module that models feature updates as the stochastic evolution of a particle system, numerically simulated via a discretized solver for the Boltzmann transport equation (BTE). This formulation mimics particle collisions and energy exchange, enabling adaptive feature refinement via physics-informed interactions. Additionally, we reveal that this mechanism induces network parameter condensation during training, where parameters progressively concentrate into a sparse subset of dominant channels. Experiments on scientific computation (PDE operator), image classification (CIFAR-10/100), and text classification (IMDb/SNLI) show consistent improvements over classic network baselines, with negligible increase of FLOPs.

KITINet: Kinetics Theory Inspired Network Architectures with PDE Simulation Approaches

TL;DR

KITINet reframes residual learning as kinetic particle dynamics governed by the Boltzmann transport equation and simulated via a Direct Simulation Monte Carlo–style collision mechanism. It introduces a collision-based residual module with divisional particle representations, demonstrating that training induces network parameter condensation and yields improvements across PDE operator learning, image classification, and NLP tasks with negligible FLOP overhead. Empirical results show consistent gains over ResNet and Transformer baselines, including notable improvements when embedded with FNO and OFormer for PDE solving and strong performance on CIFAR and BERT-based NLP tasks. The work offers a principled, physics-informed approach to architecture design, linking non-equilibrium dynamics to sparsity and generalization, while also outlining limitations and future scaling opportunities to larger benchmarks.

Abstract

Despite the widely recognized success of residual connections in modern neural networks, their design principles remain largely heuristic. This paper introduces KITINet (Kinetics Theory Inspired Network), a novel architecture that reinterprets feature propagation through the lens of non-equilibrium particle dynamics and partial differential equation (PDE) simulation. At its core, we propose a residual module that models feature updates as the stochastic evolution of a particle system, numerically simulated via a discretized solver for the Boltzmann transport equation (BTE). This formulation mimics particle collisions and energy exchange, enabling adaptive feature refinement via physics-informed interactions. Additionally, we reveal that this mechanism induces network parameter condensation during training, where parameters progressively concentrate into a sparse subset of dominant channels. Experiments on scientific computation (PDE operator), image classification (CIFAR-10/100), and text classification (IMDb/SNLI) show consistent improvements over classic network baselines, with negligible increase of FLOPs.

Paper Structure

This paper contains 24 sections, 15 equations, 10 figures, 4 tables, 1 algorithm.

Figures (10)

  • Figure 1: Overview of the proposed architecture: KITINet. It modifies the residual connection by viewing the feature and residual as the position and velocity of particles, respectively. The feature updating process is simulated by the random collision and straight-line motion of particles.
  • Figure 2: FNO's performance on NS equation, both vanilla and with KITINet applied. Left two: FNO's predictions at the final time step; Right two: their corresponding absolute error maps.
  • Figure 3: The performance of KITINet-FNO with different hyper-parameters $\text{n}\_{\textrm{divide}}$ and $\text{coll}\_{\textrm{coef}}$ on Burgers' equation and Heat equation respectively. The red and blue dashed lines show the performance of vanilla FNO as baselines.
  • Figure 4: Results of parameter condensation across network configurations on synthetic data. (a) Top: Condensation patterns in 3-layer FC-ReLU networks; Bottom: Enhanced condensation after replacing the final layer with KITINet architecture. (b) Evolution of parameter condensation on a Six-layer skip-connected Network with LeakyReLU activation function. (Row I) without applying KITINet. (Row II) applying KITINet architecture on the last layer. (Row III) applying KITINet architecture on the last two layers. We choose the evolutionary trajectories at four critical checkpoints ($t \in \{1, 10, 50, 100\}$) to characterize the phase transitions and train 100 epochs. Our observation demonstrates that the KITINet structure facilitates faster and more effective parameter condensation.
  • Figure 5: Evolution of parameter condensation on Three-layer Fully-connected Network. (Row 1) linear networks versus (Row 2) Kinet-incorporated networks. Systematic validation is performed across four activation functions: ReLU, LeakyReLU, Sigmoid, and Tanh.
  • ...and 5 more figures