Table of Contents
Fetching ...

OMNIFLOW: A Physics-Grounded Multimodal Agent for Generalized Scientific Reasoning

Hao Wu, Yongheng Zhang, Yuan Gao, Fan Xu, Fan Zhang, Ruobing Xie, Ruijian Gou, Yuxuan Liang, Xiaomeng Huang, Xian Wu

Abstract

Large Language Models (LLMs) have demonstrated exceptional logical reasoning capabilities but frequently struggle with the continuous spatiotemporal dynamics governed by Partial Differential Equations (PDEs), often resulting in non-physical hallucinations. Existing approaches typically resort to costly, domain-specific fine-tuning, which severely limits cross-domain generalization and interpretability. To bridge this gap, we propose OMNIFLOW, a neuro-symbolic architecture designed to ground frozen multimodal LLMs in fundamental physical laws without requiring domain-specific parameter updates. OMNIFLOW introduces a novel \textit{Semantic-Symbolic Alignment} mechanism that projects high-dimensional flow tensors into topological linguistic descriptors, enabling the model to perceive physical structures rather than raw pixel values. Furthermore, we construct a Physics-Guided Chain-of-Thought (PG-CoT) workflow that orchestrates reasoning through dynamic constraint injection (e.g., mass conservation) and iterative reflexive verification. We evaluate OMNIFLOW on a comprehensive benchmark spanning microscopic turbulence, theoretical Navier-Stokes equations, and macroscopic global weather forecasting. Empirical results demonstrate that OMNIFLOW significantly outperforms traditional deep learning baselines in zero-shot generalization and few-shot adaptation tasks. Crucially, it offers transparent, physically consistent reasoning reports, marking a paradigm shift from black-box fitting to interpretable scientific reasoning.

OMNIFLOW: A Physics-Grounded Multimodal Agent for Generalized Scientific Reasoning

Abstract

Large Language Models (LLMs) have demonstrated exceptional logical reasoning capabilities but frequently struggle with the continuous spatiotemporal dynamics governed by Partial Differential Equations (PDEs), often resulting in non-physical hallucinations. Existing approaches typically resort to costly, domain-specific fine-tuning, which severely limits cross-domain generalization and interpretability. To bridge this gap, we propose OMNIFLOW, a neuro-symbolic architecture designed to ground frozen multimodal LLMs in fundamental physical laws without requiring domain-specific parameter updates. OMNIFLOW introduces a novel \textit{Semantic-Symbolic Alignment} mechanism that projects high-dimensional flow tensors into topological linguistic descriptors, enabling the model to perceive physical structures rather than raw pixel values. Furthermore, we construct a Physics-Guided Chain-of-Thought (PG-CoT) workflow that orchestrates reasoning through dynamic constraint injection (e.g., mass conservation) and iterative reflexive verification. We evaluate OMNIFLOW on a comprehensive benchmark spanning microscopic turbulence, theoretical Navier-Stokes equations, and macroscopic global weather forecasting. Empirical results demonstrate that OMNIFLOW significantly outperforms traditional deep learning baselines in zero-shot generalization and few-shot adaptation tasks. Crucially, it offers transparent, physically consistent reasoning reports, marking a paradigm shift from black-box fitting to interpretable scientific reasoning.
Paper Structure (24 sections, 7 equations, 5 figures, 4 tables)

This paper contains 24 sections, 7 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Comparison of reasoning paradigms. Traditional models (top) are non-interpretable black-boxes. General VLMs (middle) suffer from physical hallucinations. OmniFlow (bottom) integrates a Symbolic Lens and Consistency Loop to deliver grounded forecasts and expert reports, uniting numerical precision with logical reasoning.
  • Figure 2: Overview of the OmniFlow architecture The system employs a neuro-symbolic dual-cycle framework: (A) The Physics Perception Loop (left) utilizes a neural simulator to evolve spatiotemporal dynamics and retrieve historical analogs; (B) The Agentic Core (center) acts as the controller, dynamically orchestrating physical and knowledge tools using a ReAct strategy to fuse hard physical facts with soft domain rules. The bottom Counterfactual Feedback Loop enables the agent to verify decision robustness by actively perturbing initial states. (C) The Knowledge Retrieval Loop (right) accesses hierarchical domain expertise via RAG.
  • Figure 3: Quantitative evaluation of scientific reasoning quality. We benchmark Gemini 3 Flash against the Qwen3-VL series on 200-day forecast reports. Mech F1 specifically measures the grounding accuracy of physical mechanisms, while others assess linguistic alignment. Results show a clear scaling trend in reasoning depth.
  • Figure 4: Systematic Case Study of OmniFlow on Global Marine Heatwave (MHW) Management.Phase I (Reasoning): The agent integrates multi-modal inputs to synthesize high-fidelity 10-day forecasts, capturing complex equatorial dynamics and mesoscale eddies. Phase II (Intervention): By executing an active counterfactual probe ($\textit{do}(\text{Forcing}=0)$), OmniFlow quantifies the causal sensitivity ($\mathcal{S}=0.78$) of thermal anomalies to atmospheric drivers. Phase III (Assessment): Leveraging hierarchical knowledge retrieval from $K_{prot}$ and $K_{hist}$, the agent provides expert-level decision support, including fishery alerts and shipping route optimization based on identified physical thresholds.
  • Figure :