Table of Contents
Fetching ...

SafeFlow: Real-Time Text-Driven Humanoid Whole-Body Control via Physics-Guided Rectified Flow and Selective Safety Gating

Hanbyel Cho, Sang-Hun Kim, Jeonguk Kang, Donghan Koo

Abstract

Recent advances in real-time interactive text-driven motion generation have enabled humanoids to perform diverse behaviors. However, kinematics-only generators often exhibit physical hallucinations, producing motion trajectories that are physically infeasible to track with a downstream motion tracking controller or unsafe for real-world deployment. These failures often arise from the lack of explicit physics-aware objectives for real-robot execution and become more severe under out-of-distribution (OOD) user inputs. Hence, we propose SafeFlow, a text-driven humanoid whole-body control framework that combines physics-guided motion generation with a 3-Stage Safety Gate driven by explicit risk indicators. SafeFlow adopts a two-level architecture. At the high level, we generate motion trajectories using Physics-Guided Rectified Flow Matching in a VAE latent space to improve real-robot executability, and further accelerate sampling via Reflow to reduce the number of function evaluations (NFE) for real-time control. The 3-Stage Safety Gate enables selective execution by detecting semantic OOD prompts using a Mahalanobis score in text-embedding space, filtering unstable generations via a directional sensitivity discrepancy metric, and enforcing final hard kinematic constraints such as joint and velocity limits before passing the generated trajectory to a low-level motion tracking controller. Extensive experiments on the Unitree G1 demonstrate that SafeFlow outperforms prior diffusion-based methods in success rate, physical compliance, and inference speed, while maintaining diverse expressiveness.

SafeFlow: Real-Time Text-Driven Humanoid Whole-Body Control via Physics-Guided Rectified Flow and Selective Safety Gating

Abstract

Recent advances in real-time interactive text-driven motion generation have enabled humanoids to perform diverse behaviors. However, kinematics-only generators often exhibit physical hallucinations, producing motion trajectories that are physically infeasible to track with a downstream motion tracking controller or unsafe for real-world deployment. These failures often arise from the lack of explicit physics-aware objectives for real-robot execution and become more severe under out-of-distribution (OOD) user inputs. Hence, we propose SafeFlow, a text-driven humanoid whole-body control framework that combines physics-guided motion generation with a 3-Stage Safety Gate driven by explicit risk indicators. SafeFlow adopts a two-level architecture. At the high level, we generate motion trajectories using Physics-Guided Rectified Flow Matching in a VAE latent space to improve real-robot executability, and further accelerate sampling via Reflow to reduce the number of function evaluations (NFE) for real-time control. The 3-Stage Safety Gate enables selective execution by detecting semantic OOD prompts using a Mahalanobis score in text-embedding space, filtering unstable generations via a directional sensitivity discrepancy metric, and enforcing final hard kinematic constraints such as joint and velocity limits before passing the generated trajectory to a low-level motion tracking controller. Extensive experiments on the Unitree G1 demonstrate that SafeFlow outperforms prior diffusion-based methods in success rate, physical compliance, and inference speed, while maintaining diverse expressiveness.
Paper Structure (18 sections, 8 equations, 6 figures, 3 tables)

This paper contains 18 sections, 8 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Failure Cases of a Baseline Text-Driven Reference Motion Generator. While a kinematics-only baseline xie2026textop produces physically feasible motions for simple prompts (a), it often generates infeasible references---including joint limit violations (b) and self-collisions (c)---even under in-distribution commands. For out-of-distribution prompts, the generation process becomes unstable, leading to structural collapse and unsafe, implausible full-body configurations (d). These failure modes underscore the critical need for physics-guided generation and runtime safety gating.
  • Figure 2: Overview of SafeFlow.Top (Deployment, Online): A 3-Stage Safety Gate hierarchically filters OOD semantics, generation instability, and kinematic violations. A reflow-accelerated high-level motion generator provides physically feasible reference motions. If accepted, these are executed by the downstream motion tracking controller; otherwise, a safe fallback is triggered. Bottom (Training, Offline): The motion generator is trained via VAE latent learning and physics-guided flow matching with reflow distillation (NFE=1). The motion tracking controller is trained in simulation via RL.
  • Figure 3: Kinematic Feasibility and Tracking Stability. Despite generating dynamic motions (left), our full pipeline, SafeFlow (+ Guid. & Reflow), stabilizes kinematic references and improves tracking. (a) Generator-only: SafeFlow suppresses erratic spikes in CoM velocity and joint acceleration. (b) System-level: SafeFlow mitigates torque chattering and joint velocity spikes, enabling hardware-safe tracking. The x-axis represents time in frames, showing a representative active segment (frames 600--950).
  • Figure 4: Generation Instability Score $\mathcal{R}$ Detects Failure-Prone References and Motivates Stage 2. Mean tracking MPJPE of 10-frame windows grouped into absolute $\mathcal{R}$ quintiles for ID and OOD sequences. MPJPE increases monotonically with $\mathcal{R}$, indicating that high-$\mathcal{R}$ windows correspond to physically unstable references. Notably, even ID prompts produce high-instability windows (ID Q5, 87.4 $\mathrm{mm}$) with larger errors than low-instability OOD windows (OOD Q1, 56.6 $\mathrm{mm}$), showing that semantic OOD filtering (Stage 1) is insufficient and Stage 2 monitoring is necessary.
  • Figure 5: Instability Score-Triggered Safe Fallback. When the instability score $\mathcal{R}$ exceeds the fallback threshold due to unstable flow dynamics, Stage 2 temporarily overrides the current command, injects a standing prompt, and interpolates the tracker reference toward a predefined standing pose. Without Stage 2, the robot fails to track the unstable reference motion; with Stage 2 enabled, it remains stable and awaits the next prompt.
  • ...and 1 more figures