PhysiFlow: Physics-Aware Humanoid Whole-Body VLA via Multi-Brain Latent Flow Matching and Robust Tracking

Weikai Qin; Sichen Wu; Ci Chen; Mengfan Liu; Linxi Feng; Xinru Cui; Haoqi Han; Hesheng Wang

PhysiFlow: Physics-Aware Humanoid Whole-Body VLA via Multi-Brain Latent Flow Matching and Robust Tracking

Weikai Qin, Sichen Wu, Ci Chen, Mengfan Liu, Linxi Feng, Xinru Cui, Haoqi Han, Hesheng Wang

TL;DR

A semantic-motion intent guided, physics-aware multi-brain VLA framework for humanoid whole-body control and demonstrated that the framework enabled reliable vision-language-guided full-body coordination for humanoid robots.

Abstract

In the domain of humanoid robot control, the fusion of Vision-Language-Action (VLA) with whole-body control is essential for semantically guided execution of real-world tasks. However, existing methods encounter challenges in terms of low VLA inference efficiency or an absence of effective semantic guidance for whole-body control, resulting in instability in dynamic limb-coordinated tasks. To bridge this gap, we present a semantic-motion intent guided, physics-aware multi-brain VLA framework for humanoid whole-body control. A series of experiments was conducted to evaluate the performance of the proposed framework. The experimental results demonstrated that the framework enabled reliable vision-language-guided full-body coordination for humanoid robots.

PhysiFlow: Physics-Aware Humanoid Whole-Body VLA via Multi-Brain Latent Flow Matching and Robust Tracking

TL;DR

Abstract

Paper Structure (17 sections, 5 equations, 6 figures, 2 tables)

This paper contains 17 sections, 5 equations, 6 figures, 2 tables.

INTRODUCTION
RELATED WORK
VLA for Robotics Learning
Humanoid Whole-Body Control
Method
Multi-brain Architecture
The design of the Neocortical Brain
The design of the Basal Ganglionic Brain
The design of the Cerebellar Brain
VLA Data Generation
EXPERIMENT
The ablation of the Neocortical Brain
The ablation of the Basal Ganglionic Brain
Simulation Experiments
Real-World Experiments
...and 2 more sections

Figures (6)

Figure 1: Introducing PhysiFlow, a multi-brain VLA humanoid system that operates on Unitree G1 robots and performs end-to-end VLA humanoid whole body control in large spaces. The proposed system achieves consecutive tasks autonomously, including (a-c) walking to the designated item, sitting on the proposed item, and raising arm; (d-f) circling the designated item, standing up from the specific item and turning right.
Figure 2: The overall pipeline of PhysiFlow. This bio-inspired architecture decouples semantic reasoning from physics-aware execution. (a) Neocortical Brain: A curriculum-based CVAE processes vision and language to synthesize a 10 latent vector $z_{vl}$, aligning task semantics with motion intent. (b) Basal Ganglionic Brain: Conditioned on $z_{vl}$ and robot states, a flow-matching model generates 50 motion sequence $m_t$ for continuity. (c) Cerebellar Brain: A robust motion tracker enforces physical constraints, translating these chunks into stable motor commands for closed-loop whole-body control.
Figure 3: Visualization of the VLA dataset.(a) Diverse visuals with various Scenes and Items. (b) Diverse camera angles with ego and exo views. (c) Diverse task from turning around to standing up
Figure 4: Performance benchmarking of the Basal Ganglionic Brain. The proposed flow-matching (FM) paradigm is evaluated against autoregressive (AR) and Denoising Diffusion Probabilistic Model (DDPM) baselines.
Figure 5: Real-world execution of semantically guided whole-body tasks by the Unitree G1 humanoid robot.Top: Complex VLA maneuvers requiring continuous spatial navigation and dynamic multi-limb coordination. Bottom: Basic VLA tasks demonstrating responsive semantic execution and robust postural stability. These results validate the system's capacity to maintain physical compliance and dynamic consistency during unconstrained deployment.
...and 1 more figures

PhysiFlow: Physics-Aware Humanoid Whole-Body VLA via Multi-Brain Latent Flow Matching and Robust Tracking

TL;DR

Abstract

PhysiFlow: Physics-Aware Humanoid Whole-Body VLA via Multi-Brain Latent Flow Matching and Robust Tracking

Authors

TL;DR

Abstract

Table of Contents

Figures (6)