SkillVLA: Tackling Combinatorial Diversity in Dual-Arm Manipulation via Skill Reuse

Xuanran Zhai; Zekai Huang; Longyan Wu; Qianyou Zhao; Qiaojun Yu; Jieji Ren; Ce Hao; Harold Soh

SkillVLA: Tackling Combinatorial Diversity in Dual-Arm Manipulation via Skill Reuse

Xuanran Zhai, Zekai Huang, Longyan Wu, Qianyou Zhao, Qiaojun Yu, Jieji Ren, Ce Hao, Harold Soh

TL;DR

It is argued that effective bimanual VLAs should support skill reuse - the ability to recombine previously learned single-arm skills across novel left-right pairings - thereby avoiding the need to separately learn every possible combination.

Abstract

Recent progress in vision-language-action (VLA) models has demonstrated strong potential for dual-arm manipulation, enabling complex behaviors and generalization to unseen environments. However, mainstream bimanual VLA formulations largely overlook the critical challenge of combinatorial diversity. Different pairings of single-arm behaviors can induce qualitatively distinct task behaviors, yet existing models do not explicitly account for this structure. We argue that effective bimanual VLAs should support skill reuse - the ability to recombine previously learned single-arm skills across novel left-right pairings - thereby avoiding the need to separately learn every possible combination. Current VLA designs entangle skills across arms, preventing such recomposition and limiting scalability. To address this limitation, we propose SkillVLA, a framework explicitly designed to enable skill reuse in dual-arm manipulation. Extensive experiments demonstrate that SkillVLA substantially improves skill composition, increasing overall success rate from 0% to 51%, and achieves strong performance on cooperative and long-horizon tasks.

SkillVLA: Tackling Combinatorial Diversity in Dual-Arm Manipulation via Skill Reuse

TL;DR

Abstract

Paper Structure (28 sections, 21 equations, 9 figures, 4 tables)

This paper contains 28 sections, 21 equations, 9 figures, 4 tables.

Introduction
Related Work
Problem Formulation
Skill Reuse in Dual-Arm Manipulation
Definitions
Skill Selector
Skill Entanglement in Current VLAs
Method: SkillVLA
Method Pipeline
Additional Cooperation-Level Learning
Experiments
Single-Arm Skill Reuse
Dual-Arm (Cooperative) Skill Reproduction
Applications in Long-Horizon Settings
Reuse of Learned Skills in Continual Learning
...and 13 more sections

Figures (9)

Figure 1: SkillVLA extracts single-arm skills from training data with hierarchical reasoning and skill-adaptive learning, being able to recompose them into unseen combinations during test time.
Figure 2: SkillVLA framework. SkillVLA adopts a two-level reasoning pipeline, where the high-level VLM generates separate subtasks for arms and low-level VLMs further process the prompts to instruct action generation. Inter-arm cross-attention enables cooperative behaviors generation, controlled by a collaboration estimator that identifies the operation mode required.
Figure 3: Skill recomposition tasks.(A): The models are trained on demonstrations of three skills for each arm. (B): After the models have learned the skills, zero-shot tests are conducted for every possible combinations of left and right-arm skills.
Figure 4: Cooperative tasks.(a)Shake: Shake the cup with a cap without making them fall apart. (b)Ball: Lift the ball steadily. (c)Align: Align the blocks on the table.
Figure 5: Long-horizon tasks behaviors and results.Top: Behavior of $\pi_{0.5}$ in Tubes. Middle left: Behavior of SkillVLA in Tubes. Bottom left: Changes of $\alpha$ values throughout the completion, respectively from SkillVLA and an ablated version without discretization of $\alpha$. Bottom right: Averaged progress score and completion time of methods on the long-horizon tasks.
...and 4 more figures

Theorems & Definitions (3)

Definition 1: Single-Arm Skills
Definition 2: Dual-Arm Skills
Definition 3: Skill Reuse

SkillVLA: Tackling Combinatorial Diversity in Dual-Arm Manipulation via Skill Reuse

TL;DR

Abstract

SkillVLA: Tackling Combinatorial Diversity in Dual-Arm Manipulation via Skill Reuse

Authors

TL;DR

Abstract

Table of Contents

Figures (9)

Theorems & Definitions (3)