Table of Contents
Fetching ...

VICON: Vision In-Context Operator Networks for Multi-Physics Fluid Dynamics Prediction

Yadi Cao, Yuxuan Liu, Liu Yang, Rose Yu, Hayden Schaeffer, Stanley Osher

TL;DR

VICON addresses the bottleneck of applying in-context operator networks to dense, higher-dimensional fluid data by introducing a patch-based vision transformer that preserves few-shot generalization across multi-physics PDEs. The method encodes input–output function pairs with patch and function positional embeddings, uses a specialized attention mask for autoregressive next-function prediction, and normalizes prompts to maintain operator consistency. Empirical results across three fluid benchmarks show significant accuracy and efficiency gains over state-of-the-art baselines, along with robust performance under variable timesteps and imperfect measurements, demonstrating practical potential for deployment in real-world sensing and control settings. The work lays a foundation for scalable, flexible operator learning in physics-informed contexts and points to future extensions to 3D domains and irregular geometries.

Abstract

In-Context Operator Networks (ICONs) have demonstrated the ability to learn operators across diverse partial differential equations using few-shot, in-context learning. However, existing ICONs process each spatial point as an individual token, severely limiting computational efficiency when handling dense data in higher spatial dimensions. We propose Vision In-Context Operator Networks (VICON), which integrates vision transformer architectures to efficiently process 2D data through patch-wise operations while preserving ICON's adaptability to multiphysics systems and varying timesteps. Evaluated across three fluid dynamics benchmarks, VICON significantly outperforms state-of-the-art baselines: DPOT and MPP, reducing the averaged last-step rollout error by 37.9% compared to DPOT and 44.7% compared to MPP, while requiring only 72.5% and 34.8% of their respective inference times. VICON naturally supports flexible rollout strategies with varying timestep strides, enabling immediate deployment in imperfect measurement systems where sampling frequencies may differ or frames might be dropped - common challenges in real-world settings - without requiring retraining or interpolation. In these realistic scenarios, VICON exhibits remarkable robustness, experiencing only 24.41% relative performance degradation compared to 71.37%-74.49% degradation in baseline methods, demonstrating its versatility for deploying in realistic applications. Our scripts for processing datasets and code are publicly available at https://github.com/Eydcao/VICON.

VICON: Vision In-Context Operator Networks for Multi-Physics Fluid Dynamics Prediction

TL;DR

VICON addresses the bottleneck of applying in-context operator networks to dense, higher-dimensional fluid data by introducing a patch-based vision transformer that preserves few-shot generalization across multi-physics PDEs. The method encodes input–output function pairs with patch and function positional embeddings, uses a specialized attention mask for autoregressive next-function prediction, and normalizes prompts to maintain operator consistency. Empirical results across three fluid benchmarks show significant accuracy and efficiency gains over state-of-the-art baselines, along with robust performance under variable timesteps and imperfect measurements, demonstrating practical potential for deployment in real-world sensing and control settings. The work lays a foundation for scalable, flexible operator learning in physics-informed contexts and points to future extensions to 3D domains and irregular geometries.

Abstract

In-Context Operator Networks (ICONs) have demonstrated the ability to learn operators across diverse partial differential equations using few-shot, in-context learning. However, existing ICONs process each spatial point as an individual token, severely limiting computational efficiency when handling dense data in higher spatial dimensions. We propose Vision In-Context Operator Networks (VICON), which integrates vision transformer architectures to efficiently process 2D data through patch-wise operations while preserving ICON's adaptability to multiphysics systems and varying timesteps. Evaluated across three fluid dynamics benchmarks, VICON significantly outperforms state-of-the-art baselines: DPOT and MPP, reducing the averaged last-step rollout error by 37.9% compared to DPOT and 44.7% compared to MPP, while requiring only 72.5% and 34.8% of their respective inference times. VICON naturally supports flexible rollout strategies with varying timestep strides, enabling immediate deployment in imperfect measurement systems where sampling frequencies may differ or frames might be dropped - common challenges in real-world settings - without requiring retraining or interpolation. In these realistic scenarios, VICON exhibits remarkable robustness, experiencing only 24.41% relative performance degradation compared to 71.37%-74.49% degradation in baseline methods, demonstrating its versatility for deploying in realistic applications. Our scripts for processing datasets and code are publicly available at https://github.com/Eydcao/VICON.

Paper Structure

This paper contains 51 sections, 11 equations, 13 figures, 14 tables, 5 algorithms.

Figures (13)

  • Figure 1: VICON model overview. (a) The formation process for conditions (COND) and quantities of interest (QOI) pairs. $\Delta t$ is randomly sampled during training. (b) Model illustration. The inputs to the model are pairs of COND and QOI, which are patchified and flattened before feeding into the transformer layers. The outputs, which represent different patches in the output frame, are transformed back to obtain the final predictions. (c) With imperfect temporal measurements, VICON forms pairs using only clean data, and does not need to fill missing frames.
  • Figure 2: Main experiment results. (a) Last step rollout errors on 3 datasets. VICON outperforms MPP on all datasets and outperforms DPOT on 2 datasets. (b) VICON allows flexible rollout strategies to reduce error accumulation and demonstrates stride extrapolation. (c) VICON is robust to imperfect temporal measurements, while MPP and DPOT suffer from performance degradation. (d) VICON is smaller in size and has faster rollout time per step.
  • Figure 3: Comparison of rollout errors (scaled by std) across different datasets and models. We show errors for two VICON rollout strategies: single step rollout and flexible rollout strategy. For flexible step, step 3 works optimally for the PDEArena-Incomp dataset, while step 1 works best for PDEBench-Comp-LowVis and PDEBench-Comp-HighVis.
  • Figure 4: Ablation studies across the three datasets. Top row (a-c): Impact of patch resolutions ($8, 16, 32, 64$) showing optimal performance at patch size 16. Middle row (d-f): Effect of different positional encoding combinations. Bottom row (g-i): Performance variation with different context lengths (6, 8, 10 pairs).
  • Figure 5: Comparison of rollout errors across different datasets, using single-step and flexible-step strategies with varying maximum step sizes ($s_{\text{max}} = 1,3,5,7$).
  • ...and 8 more figures