SuperSuit: An Isomorphic Bimodal Interface for Scalable Mobile Manipulation

Tongqing Chen; Hang Wu; Jiasen Wang; Xiaotao Li; Zhu Jin; Lu Fang

SuperSuit: An Isomorphic Bimodal Interface for Scalable Mobile Manipulation

Tongqing Chen, Hang Wu, Jiasen Wang, Xiaotao Li, Zhu Jin, Lu Fang

TL;DR

Results indicate that consistent kinematic representations across collection modalities enable scalable data acquisition for long-horizon mobile manipulation, and indicate that consistent kinematic representations across collection modalities enable scalable data acquisition for embodied AI.

Abstract

High-quality, long-horizon demonstrations are essential for embodied AI, yet acquiring such data for tightly coupled wheeled mobile manipulators remains a fundamental bottleneck. Unlike fixed-base systems, mobile manipulators require continuous coordination between $SE(2)$ locomotion and precise manipulation, exposing limitations in existing teleoperation and wearable interfaces. We present \textbf{SuperSuit}, a bimodal data acquisition framework that supports both robot-in-the-loop teleoperation and active demonstration under a shared kinematic interface. Both modalities produce structurally identical joint-space trajectories, enabling direct data mixing without modifying downstream policies. For locomotion, SuperSuit maps natural human stepping to continuous planar base velocities, eliminating discrete command switches. For manipulation, it employs a strictly isomorphic wearable arm in both modes, while policy training is formulated in a shift-invariant delta-joint representation to mitigate calibration offsets and structural compliance without inverse kinematics. Real-world experiments on long-horizon mobile manipulation tasks show 2.6$\times$ higher demonstration throughput in active mode compared to a teleoperation baseline, comparable policy performance when substituting teleoperation data with active demonstrations at fixed dataset size, and monotonic performance improvement as active data volume increases. These results indicate that consistent kinematic representations across collection modalities enable scalable data acquisition for long-horizon mobile manipulation.

SuperSuit: An Isomorphic Bimodal Interface for Scalable Mobile Manipulation

TL;DR

Abstract

locomotion and precise manipulation, exposing limitations in existing teleoperation and wearable interfaces. We present \textbf{SuperSuit}, a bimodal data acquisition framework that supports both robot-in-the-loop teleoperation and active demonstration under a shared kinematic interface. Both modalities produce structurally identical joint-space trajectories, enabling direct data mixing without modifying downstream policies. For locomotion, SuperSuit maps natural human stepping to continuous planar base velocities, eliminating discrete command switches. For manipulation, it employs a strictly isomorphic wearable arm in both modes, while policy training is formulated in a shift-invariant delta-joint representation to mitigate calibration offsets and structural compliance without inverse kinematics. Real-world experiments on long-horizon mobile manipulation tasks show 2.6

higher demonstration throughput in active mode compared to a teleoperation baseline, comparable policy performance when substituting teleoperation data with active demonstrations at fixed dataset size, and monotonic performance improvement as active data volume increases. These results indicate that consistent kinematic representations across collection modalities enable scalable data acquisition for long-horizon mobile manipulation.

Paper Structure (17 sections, 7 equations, 6 figures, 6 tables)

This paper contains 17 sections, 7 equations, 6 figures, 6 tables.

INTRODUCTION
RELATED WORK
Robot-in-the-Loop Teleoperation for Data Collection
Robot-Free and Embodied Demonstration Interfaces
METHODOLOGY
System Overview
Human-to-Torso-and-Base Kinematic Retargeting
Strict Isomorphism and Robust Action Formulation
LLM-Assisted Human-in-the-Loop Annotation Pipeline
EXPERIMENTS
Experimental Setup and Benchmark Tasks
Data Collection Efficiency
Policy Performance and Effective Throughput
Scalability Analysis
Ablation Study: Absolute vs. Delta-Joint Formulation
...and 2 more sections

Figures (6)

Figure 1: The SuperSuit Framework. Our untethered wearable interface translates human embodiment into whole-body robot control via strict isomorphic arm manipulation and zero-drift base locomotion. This synergistic architecture natively supports bimodal data acquisition (teleoperation and active collection), generating high-fidelity datasets that directly fuel imitation learning policies for autonomous mobile manipulation.
Figure 2: System Architecture of SuperSuit. Multimodal human intent is captured and decoupled via a Dual Stream Control Engine. The Dual Stream Control Engine decouples human motion into: (1) Mechanical Arm Stream for upper-body isomorphic mapping, and (2) Tracker Stream for torso and base control. Specifically, the tracker-based 6D pose is decomposed into articulated torso configurations and planar locomotion velocities. A velocity-level deadband is applied to suppress involuntary micro-sway. These robust signals simultaneously drive the mobile manipulator and feed into an LLM-assisted HIL pipeline, merging Qwen3 kinematic reasoning with Paraformer transcriptions to automatically generate high-fidelity, language-annotated datasets for VLA models.
Figure 3: Remote Teleoperation Mode. SuperSuit enables intuitive, zero-latency bimanual manipulation across diverse spatial tasks: (a) Pick and Place, (b) Blocks Collection, and (c) Crate Stacking.
Figure 4: Active Demonstration Mode. A continuous sequence of the Pick and Place benchmark performed directly by the operator.
Figure 5: Kinematic Alignment of the SuperSuit. The exoskeleton's mechanical axes structurally mirror the operator's anatomical degrees of freedom. (Note: The grippers are 3D printed in white for visual clarity and teleoperation, whereas actual data collection employs black grippers identical to the target robot's configuration.)
...and 1 more figures

SuperSuit: An Isomorphic Bimodal Interface for Scalable Mobile Manipulation

TL;DR

Abstract

SuperSuit: An Isomorphic Bimodal Interface for Scalable Mobile Manipulation

Authors

TL;DR

Abstract

Table of Contents

Figures (6)