Table of Contents
Fetching ...

FACTR: Force-Attending Curriculum Training for Contact-Rich Policy Learning

Jason Jingzhou Liu, Yulong Li, Kenneth Shaw, Tony Tao, Ruslan Salakhutdinov, Deepak Pathak

TL;DR

FACTR couples a low-cost force-feedback teleoperation system with a Force-Attending Curriculum Training framework to address generalization gaps in contact-rich manipulation. By progressively reducing visual input corruption, FACTR steers policy learning to rely on force cues early and integrate vision later, yielding strong improvements on unseen objects. The approach is supported by NTK-inspired analysis, ablations, and a transparent cost report, demonstrating practical gains in both teleoperation efficacy and policy generalization across four tasks. The work also provides detailed control laws and architectural design choices to facilitate adoption and extension in force-aware robotic learning.

Abstract

Many contact-rich tasks humans perform, such as box pickup or rolling dough, rely on force feedback for reliable execution. However, this force information, which is readily available in most robot arms, is not commonly used in teleoperation and policy learning. Consequently, robot behavior is often limited to quasi-static kinematic tasks that do not require intricate force-feedback. In this paper, we first present a low-cost, intuitive, bilateral teleoperation setup that relays external forces of the follower arm back to the teacher arm, facilitating data collection for complex, contact-rich tasks. We then introduce FACTR, a policy learning method that employs a curriculum which corrupts the visual input with decreasing intensity throughout training. The curriculum prevents our transformer-based policy from over-fitting to the visual input and guides the policy to properly attend to the force modality. We demonstrate that by fully utilizing the force information, our method significantly improves generalization to unseen objects by 43\% compared to baseline approaches without a curriculum. Video results, codebases, and instructions at https://jasonjzliu.com/factr/

FACTR: Force-Attending Curriculum Training for Contact-Rich Policy Learning

TL;DR

FACTR couples a low-cost force-feedback teleoperation system with a Force-Attending Curriculum Training framework to address generalization gaps in contact-rich manipulation. By progressively reducing visual input corruption, FACTR steers policy learning to rely on force cues early and integrate vision later, yielding strong improvements on unseen objects. The approach is supported by NTK-inspired analysis, ablations, and a transparent cost report, demonstrating practical gains in both teleoperation efficacy and policy generalization across four tasks. The work also provides detailed control laws and architectural design choices to facilitate adoption and extension in force-aware robotic learning.

Abstract

Many contact-rich tasks humans perform, such as box pickup or rolling dough, rely on force feedback for reliable execution. However, this force information, which is readily available in most robot arms, is not commonly used in teleoperation and policy learning. Consequently, robot behavior is often limited to quasi-static kinematic tasks that do not require intricate force-feedback. In this paper, we first present a low-cost, intuitive, bilateral teleoperation setup that relays external forces of the follower arm back to the teacher arm, facilitating data collection for complex, contact-rich tasks. We then introduce FACTR, a policy learning method that employs a curriculum which corrupts the visual input with decreasing intensity throughout training. The curriculum prevents our transformer-based policy from over-fitting to the visual input and guides the policy to properly attend to the force modality. We demonstrate that by fully utilizing the force information, our method significantly improves generalization to unseen objects by 43\% compared to baseline approaches without a curriculum. Video results, codebases, and instructions at https://jasonjzliu.com/factr/

Paper Structure

This paper contains 32 sections, 31 equations, 9 figures, 10 tables, 1 algorithm.

Figures (9)

  • Figure 1: Our low-cost bimanual teleoperation system with force-feedback. The system features two actuated leader arms, two follower arms with external joint torque sensors (such as the Franka Panda and the KUKA LBR iiwa), a front camera and two wrist cameras.
  • Figure 2: Customizable Joint Regularization [Left] Without the flexibility to define the resting joint configuration $q_{rest}$, the arm’s reachability is restricted, leading to collisions in confined spaces. [Right] Our leader arm allows the user to define custom resting joint $q_{rest}$, helping the follower arm reach targets in confined spaces.
  • Figure 3: FACTR allows our policy to better integrate force information without overfitting to visual information, resulting in better generalization to objects with unseen visual appearances and geometries. Our policy takes as inputs RGB images $I$ and external joint torque $\tau$, which are then tokenized by a vision encoder and a force encoder before fed into an action transformer to regress joint position targets $q_{t:t+k}$. FACTR applies a blurring operator of scale $\sigma_n$ in either pixel or latent space, initialized at a large value then gradually decreased through the training.
  • Figure 4: Tasks. We evaluate our leader-follower teleoperation system and autonomous policies trained with FACTR on four contact-rich tasks. These tasks are challenging as they require the robot to perceive and respond to the force feedback as it manipulates objects with unseen visual appearances and geometries.
  • Figure 5: FACTR leads to better object generalization.
  • ...and 4 more figures