VICON: Vision In-Context Operator Networks for Multi-Physics Fluid Dynamics Prediction
Yadi Cao, Yuxuan Liu, Liu Yang, Rose Yu, Hayden Schaeffer, Stanley Osher
TL;DR
VICON addresses the bottleneck of applying in-context operator networks to dense, higher-dimensional fluid data by introducing a patch-based vision transformer that preserves few-shot generalization across multi-physics PDEs. The method encodes input–output function pairs with patch and function positional embeddings, uses a specialized attention mask for autoregressive next-function prediction, and normalizes prompts to maintain operator consistency. Empirical results across three fluid benchmarks show significant accuracy and efficiency gains over state-of-the-art baselines, along with robust performance under variable timesteps and imperfect measurements, demonstrating practical potential for deployment in real-world sensing and control settings. The work lays a foundation for scalable, flexible operator learning in physics-informed contexts and points to future extensions to 3D domains and irregular geometries.
Abstract
In-Context Operator Networks (ICONs) have demonstrated the ability to learn operators across diverse partial differential equations using few-shot, in-context learning. However, existing ICONs process each spatial point as an individual token, severely limiting computational efficiency when handling dense data in higher spatial dimensions. We propose Vision In-Context Operator Networks (VICON), which integrates vision transformer architectures to efficiently process 2D data through patch-wise operations while preserving ICON's adaptability to multiphysics systems and varying timesteps. Evaluated across three fluid dynamics benchmarks, VICON significantly outperforms state-of-the-art baselines: DPOT and MPP, reducing the averaged last-step rollout error by 37.9% compared to DPOT and 44.7% compared to MPP, while requiring only 72.5% and 34.8% of their respective inference times. VICON naturally supports flexible rollout strategies with varying timestep strides, enabling immediate deployment in imperfect measurement systems where sampling frequencies may differ or frames might be dropped - common challenges in real-world settings - without requiring retraining or interpolation. In these realistic scenarios, VICON exhibits remarkable robustness, experiencing only 24.41% relative performance degradation compared to 71.37%-74.49% degradation in baseline methods, demonstrating its versatility for deploying in realistic applications. Our scripts for processing datasets and code are publicly available at https://github.com/Eydcao/VICON.
