Table of Contents
Fetching ...

IG-RFT: An Interaction-Guided RL Framework for VLA Models in Long-Horizon Robotic Manipulation

Zhian Su, Weijie Kong, Haonan Dong, Huixu Dong

TL;DR

This work establishes and validates a novel reinforced fine-tuning system for VLA models in real-world robotic manipulation and introduces Interaction-Guided Advantage Weighted Regression (IG-AWR), an RL algorithm that dynamically modulates exploration intensity based on the robot's interaction status.

Abstract

Vision-Language-Action (VLA) models have demonstrated significant potential for generalist robotic policies; however, they struggle to generalize to long-horizon complex tasks in novel real-world domains due to distribution shifts and the scarcity of high-quality demonstrations. Although reinforcement learning (RL) offers a promising avenue for policy improvement, applying it to real-world VLA fine-tuning faces challenges regarding exploration efficiency, training stability, and sample cost. To address these issues, we propose IG-RFT, a novel Interaction-Guided Reinforced Fine-Tuning system designed for flow-based VLA models. Firstly, to facilitate effective policy optimization, we introduce Interaction-Guided Advantage Weighted Regression (IG-AWR), an RL algorithm that dynamically modulates exploration intensity based on the robot's interaction status. Furthermore, to address the limitations of sparse or task-specific rewards, we design a novel hybrid dense reward function that integrates the trajectory-level reward and the subtask-level reward. Finally, we construct a three-stage RL system comprising SFT, Offline RL, and Human-in-the-Loop RL for fine-tuning VLA models. Extensive real-world experiments on four challenging long-horizon tasks demonstrate that IG-RFT achieves an average success rate of 85.0%, significantly outperforming SFT (18.8%) and standard Offline RL baselines (40.0%). Ablation studies confirm the critical contributions of IG-AWR and hybrid reward shaping. In summary, our work establishes and validates a novel reinforced fine-tuning system for VLA models in real-world robotic manipulation.

IG-RFT: An Interaction-Guided RL Framework for VLA Models in Long-Horizon Robotic Manipulation

TL;DR

This work establishes and validates a novel reinforced fine-tuning system for VLA models in real-world robotic manipulation and introduces Interaction-Guided Advantage Weighted Regression (IG-AWR), an RL algorithm that dynamically modulates exploration intensity based on the robot's interaction status.

Abstract

Vision-Language-Action (VLA) models have demonstrated significant potential for generalist robotic policies; however, they struggle to generalize to long-horizon complex tasks in novel real-world domains due to distribution shifts and the scarcity of high-quality demonstrations. Although reinforcement learning (RL) offers a promising avenue for policy improvement, applying it to real-world VLA fine-tuning faces challenges regarding exploration efficiency, training stability, and sample cost. To address these issues, we propose IG-RFT, a novel Interaction-Guided Reinforced Fine-Tuning system designed for flow-based VLA models. Firstly, to facilitate effective policy optimization, we introduce Interaction-Guided Advantage Weighted Regression (IG-AWR), an RL algorithm that dynamically modulates exploration intensity based on the robot's interaction status. Furthermore, to address the limitations of sparse or task-specific rewards, we design a novel hybrid dense reward function that integrates the trajectory-level reward and the subtask-level reward. Finally, we construct a three-stage RL system comprising SFT, Offline RL, and Human-in-the-Loop RL for fine-tuning VLA models. Extensive real-world experiments on four challenging long-horizon tasks demonstrate that IG-RFT achieves an average success rate of 85.0%, significantly outperforming SFT (18.8%) and standard Offline RL baselines (40.0%). Ablation studies confirm the critical contributions of IG-AWR and hybrid reward shaping. In summary, our work establishes and validates a novel reinforced fine-tuning system for VLA models in real-world robotic manipulation.
Paper Structure (38 sections, 6 equations, 7 figures, 5 tables, 1 algorithm)

This paper contains 38 sections, 6 equations, 7 figures, 5 tables, 1 algorithm.

Figures (7)

  • Figure 1: Overview of the Interaction-Guided Reinforced Fine-tuning System (IG-RFT). The system integrates a multi-head Q-Former-based critic model to provide dense value and interaction guidance. The supervision target of the dense value is our designed general hybrid value function. We use the interaction signal to modulate the flow matching initial noise to balance RL exploration and exploitation. The training process progresses through three stages: Supervised Fine-Tuning (SFT), data-efficient Offline RL using the proposed IG-AWR algorithm, and final Human-in-the-Loop refinement to master long-horizon real-world manipulation tasks.
  • Figure 2: Illustration of Dynamic Uncertainty Modulation in IG-AWR. This mechanism modulates the initial sampling noise $\epsilon$ of the flow ODE based on the extracted interaction signal $I_t$. During non-interaction phases (top), a higher variance $\sigma_{\text{high}}$ is applied to encourage diverse trajectory exploration. Conversely, during interaction phases (bottom), the variance is reduced to $\sigma_{\text{low}}$ to ensure precision and stability for contact-rich manipulation.
  • Figure 3: Visualization of the Reward Signal and Value Function. We illustrate the proposed reward shaping mechanism on the "Parcel Packing" task. (A) The reward signal $r_t$ combines a constant trajectory-level reward with weighted subtask-level rewards (peaks), providing dense feedback at key milestones. (B) The Value (Return-to-Go) function $V_t$ monotonically decreases from 1.0 to 0 as the agent progresses through subtasks, serving as a precise progress indicator for the Critic.
  • Figure 4: Execution Sequences of Long-Horizon Tasks. From top to bottom, we visualize the successful execution flows for four challenging tasks: Parcel Packing, Fruit Bagging, Block Stacking, and Drink Shelving. Arrows indicate the progression of subgoals. Note: The text labels above each stage are abbreviated for visualization clarity and differ from the actual natural language instructions used by the model. For the detailed sequence of subtask instructions for each task, please refer to Appendix \ref{['sec:tasks']}.
  • Figure 5: Data Efficiency Analysis in Human-in-the-Loop Fine-tuning. We report the mean success rates averaged over the Parcel Packing and Drink Shelving tasks. The curves show that IG-RFT (Ours) achieves superior sample efficiency, converging to 77.5% success with only 40 interventions, significantly outperforming baselines. Shaded regions indicate the min-max range across tasks.
  • ...and 2 more figures