BCAT: A Block Causal Transformer for PDE Foundation Models for Fluid Dynamics
Yuxuan Liu, Jingmin Sun, Hayden Schaeffer
TL;DR
<3-5 sentence high-level summary> BCAT introduces a block causal transformer-based PDE foundation model for autoregressive prediction of 2D fluid dynamics. It reframes forecasting as next-frame prediction to better capture spatiotemporal dependencies, achieving substantial speedups and accuracy gains over next-token approaches. Trained on six PDE families from PDEBench, PDEArena, and CFDBench, it attains an average relative L2 error around 1.18% and demonstrates strong zero-shot and transfer performance, including turbulence fine-tuning that surpasses prior methods by over 40%. The work also shows notable architectural and optimization innovations, including the Muon optimizer and patch-based tokenization, enabling scalable, efficient learning for complex fluid dynamics tasks.
Abstract
We introduce BCAT, a PDE foundation model designed for autoregressive prediction of solutions to two dimensional fluid dynamics problems. Our approach uses a block causal transformer architecture to model next frame predictions, leveraging previous frames as contextual priors rather than relying solely on sub-frames or pixel-based inputs commonly used in image generation methods. This block causal framework more effectively captures the spatial dependencies inherent in nonlinear spatiotemporal dynamics and physical phenomena. In an ablation study, next frame prediction demonstrated a 3.5x accuracy improvement over next token prediction. BCAT is trained on a diverse range of fluid dynamics datasets, including incompressible and compressible Navier-Stokes equations across various geometries and parameter regimes, as well as the shallow-water equations. The model's performance was evaluated on 6 distinct downstream prediction tasks and tested on about 8K trajectories to measure robustness on a variety of fluid dynamics simulations. BCAT achieved an average relative error of 1.18% across all evaluation tasks, outperforming prior approaches on standard benchmarks. With fine-tuning on a turbulence dataset, we show that the method adapts to new settings with more than 40% better accuracy over prior methods.
