Table of Contents
Fetching ...

CATCH-FORM-ACTer: Compliance-Aware Tactile Control and Hybrid Deformation Regulation-Based Action Transformer for Viscoelastic Object Manipulation

Hongjun Ma, Weichang Li, Jingwei Zhang, Shenlai He, Xiaoyan Deng

TL;DR

The paper tackles the problem of contact-rich manipulation of viscoelastic objects with rigid robots, where dynamic parameter mismatches and spatiotemporal force-deformation coupling hinder dexterity.It introduces CATCH-FORM-ACTer, a phase-aware framework that combines Learning from Demonstrations with an Action Chunking Transformer to plan long-horizon actions while dynamically adjusting stiffness, damping, and diffusion during different manipulation phases, via a CVAE-augmented policy that ingests multi-modal data and phase context.A physics-grounded CATCH-FORM-3D controller provides an interpretable, tunable, and stable inner-outer admittance loop based on a unified 3D Kelvin–Voigt–Maxwell viscoelastic model and a PDE-driven observer for real-time material-property estimation.Experimental validation on single-arm and bimanual tasks shows 10–20% higher success rates than prior ACT variants, leveraging spatial force and deformation fields to guide phase-dependent compliance changes and achieving sub-millimeter control accuracy.Overall, the work offers a practical route to human-like force-deformation modulation in complex viscoelastic interactions, enabling safer, more reliable manipulation in industrial, medical, and household settings.

Abstract

Automating contact-rich manipulation of viscoelastic objects with rigid robots faces challenges including dynamic parameter mismatches, unstable contact oscillations, and spatiotemporal force-deformation coupling. In our prior work, a Compliance-Aware Tactile Control and Hybrid Deformation Regulation (CATCH-FORM-3D) strategy fulfills robust and effective manipulations of 3D viscoelastic objects, which combines a contact force-driven admittance outer loop and a PDE-stabilized inner loop, achieving sub-millimeter surface deformation accuracy. However, this strategy requires fine-tuning of object-specific parameters and task-specific calibrations, to bridge this gap, a CATCH-FORM-ACTer is proposed, by enhancing CATCH-FORM-3D with a framework of Action Chunking with Transformer (ACT). An intuitive teleoperation system performs Learning from Demonstration (LfD) to build up a long-horizon sensing, decision-making and execution sequences. Unlike conventional ACT methods focused solely on trajectory planning, our approach dynamically adjusts stiffness, damping, and diffusion parameters in real time during multi-phase manipulations, effectively imitating human-like force-deformation modulation. Experiments on single arm/bimanual robots in three tasks show better force fields patterns and thus 10%-20% higher success rates versus conventional methods, enabling precise, safe interactions for industrial, medical or household scenarios.

CATCH-FORM-ACTer: Compliance-Aware Tactile Control and Hybrid Deformation Regulation-Based Action Transformer for Viscoelastic Object Manipulation

TL;DR

The paper tackles the problem of contact-rich manipulation of viscoelastic objects with rigid robots, where dynamic parameter mismatches and spatiotemporal force-deformation coupling hinder dexterity.It introduces CATCH-FORM-ACTer, a phase-aware framework that combines Learning from Demonstrations with an Action Chunking Transformer to plan long-horizon actions while dynamically adjusting stiffness, damping, and diffusion during different manipulation phases, via a CVAE-augmented policy that ingests multi-modal data and phase context.A physics-grounded CATCH-FORM-3D controller provides an interpretable, tunable, and stable inner-outer admittance loop based on a unified 3D Kelvin–Voigt–Maxwell viscoelastic model and a PDE-driven observer for real-time material-property estimation.Experimental validation on single-arm and bimanual tasks shows 10–20% higher success rates than prior ACT variants, leveraging spatial force and deformation fields to guide phase-dependent compliance changes and achieving sub-millimeter control accuracy.Overall, the work offers a practical route to human-like force-deformation modulation in complex viscoelastic interactions, enabling safer, more reliable manipulation in industrial, medical, and household settings.

Abstract

Automating contact-rich manipulation of viscoelastic objects with rigid robots faces challenges including dynamic parameter mismatches, unstable contact oscillations, and spatiotemporal force-deformation coupling. In our prior work, a Compliance-Aware Tactile Control and Hybrid Deformation Regulation (CATCH-FORM-3D) strategy fulfills robust and effective manipulations of 3D viscoelastic objects, which combines a contact force-driven admittance outer loop and a PDE-stabilized inner loop, achieving sub-millimeter surface deformation accuracy. However, this strategy requires fine-tuning of object-specific parameters and task-specific calibrations, to bridge this gap, a CATCH-FORM-ACTer is proposed, by enhancing CATCH-FORM-3D with a framework of Action Chunking with Transformer (ACT). An intuitive teleoperation system performs Learning from Demonstration (LfD) to build up a long-horizon sensing, decision-making and execution sequences. Unlike conventional ACT methods focused solely on trajectory planning, our approach dynamically adjusts stiffness, damping, and diffusion parameters in real time during multi-phase manipulations, effectively imitating human-like force-deformation modulation. Experiments on single arm/bimanual robots in three tasks show better force fields patterns and thus 10%-20% higher success rates versus conventional methods, enabling precise, safe interactions for industrial, medical or household scenarios.

Paper Structure

This paper contains 13 sections, 7 figures, 1 table.

Figures (7)

  • Figure 1: Our system integrates a teleoperation interface using 3D MoCap controllers, enhanced with visual-cue feedback and safety mechanisms (bottom left), and an LfD framework powered by CATCH-FORM-ACTer (bottom right). The proposed approach was validated on a dual-arm robotic setup, across complex, contact-rich manipulation tasks (top).
  • Figure 2: Teleoperation system for data collection using 3D MoCap controllers. The controller receives the contact force and surface deformation information from the robot’s finger sensors and uses it to provide visual cues feedback to the operator. The same concept is applied to bimanual tasks where a different 3D MoCap controller is connected to each arm.
  • Figure 3: CATCH-FORM-ACTer Network architecture. The action sequence, consisting of n robot states (stiffness/damping/diffusion parameters $R$, target EE's Cartesian pose $x$, and dexterous hand joint angles $h$), are encoded alongside the current Cartesian pose $X_t$, contact force field $f(x,y,z)$, and surface deformation field $\phi(x,y,z)$ by the CVAE encoder. This network is discarded at inference time. Right: The policy inputs are images from multiple viewpoints, the current Cartesian pose, and the measured force and deformation fields. The policy predicts a sequence of $n$ future actions.
  • Figure 4: Bimanual Insertion (Cylinder): This task involves inserting a 3D-printed peg into a matching hole with a 2 mm tolerance. Precise alignment of both arms in position and orientation is required to ensure successful insertion. During motion without contact, both arms operate in medium compliance parameters mode. As insertion begins, the left arm switches to high compliance parameters to maintain stability, while the right arm transitions to low compliance parameters, allowing force guidance to assist the insertion process. A total of 20 demonstrations were performed for this task.
  • Figure 5: Single-Arm Picking $\&$ Insertion: The task involves picking up a toy wooden peg and inserting it into a corresponding hole in a wooden box. A stable grasp is necessary to align the peg properly with the hole. During demonstrations, the robot operates in medium compliance parameters mode for general manipulation and switches to low compliance parameters mode during the insertion phase, where physical contact occurs. The peg is placed at the edge of the whiteboard with a random rotation of approximately $\pm15^\circ$. The wooden box's position is not fixed, allowing the robot to move it during insertion. A total of 30 demonstrations were conducted for this task.
  • ...and 2 more figures