Contact-Grounded Policy: Dexterous Visuotactile Policy with Generative Contact Grounding

Zhengtong Xu; Yeping Wang; Ben Abbatematteo; Jom Preechayasomboon; Sonny Chan; Nick Colonnese; Amirhossein H. Memar

Contact-Grounded Policy: Dexterous Visuotactile Policy with Generative Contact Grounding

Zhengtong Xu, Yeping Wang, Ben Abbatematteo, Jom Preechayasomboon, Sonny Chan, Nick Colonnese, Amirhossein H. Memar

TL;DR

Contact-Grounded Policy is presented, a visuotactile policy that grounds multi-point contacts by predicting coupled trajectories of actual robot state and tactile feedback, and using a learned contact-consistency mapping to convert these predictions into executable target robot states for a compliance controller.

Abstract

Contact-rich dexterous manipulation with multi-finger hands remains an open challenge in robotics because task success depends on multi-point contacts that continuously evolve and are highly sensitive to object geometry, frictional transitions, and slip. Recently, tactile-informed manipulation policies have shown promise. However, most use tactile signals as additional observations rather than modeling contact state or how their action outputs interact with low-level controller dynamics. We present Contact-Grounded Policy (CGP), a visuotactile policy that grounds multi-point contacts by predicting coupled trajectories of actual robot state and tactile feedback, and using a learned contact-consistency mapping to convert these predictions into executable target robot states for a compliance controller. CGP consists of two components: (i) a conditional diffusion model that forecasts future robot state and tactile feedback in a compressed latent space, and (ii) a learned contact-consistency mapping that converts the predicted robot state-tactile pair into executable targets for a compliance controller, enabling it to realize the intended contacts. We evaluate CGP using a physical four-finger Allegro V5 hand with Digit360 fingertip tactile sensors, and a simulated five-finger Tesollo DG-5F hand with dense whole-hand tactile arrays. Across a range of dexterous tasks including in-hand manipulation, delicate grasping, and tool use, CGP outperforms visuomotor and visuotactile diffusion-policy baselines.

Contact-Grounded Policy: Dexterous Visuotactile Policy with Generative Contact Grounding

TL;DR

Abstract

Paper Structure (29 sections, 4 equations, 11 figures, 6 tables)

This paper contains 29 sections, 4 equations, 11 figures, 6 tables.

Introduction
Related Work
Method
Problem Setup
Contact Grounding
Policy Pipeline Overview
Architecture
Residual Mapping
Inference and Execution
Latent Tactile Generation
Tactile Compression with VAE
Coupled Diffusion over State and Tactile Latent
Implementation Details
Tactile Encoders and Decoders
Visual Encoder and Diffusion
...and 14 more sections

Figures (11)

Figure 1: Schematic of contact grounding using a 3-DoF revolute finger, illustrating the actual robot state, target robot state, and the resulting contact patches. We assume that each joint is controlled by a low-level proportional-derivative (PD) controller, which can be viewed as a virtual spring-damper that maps the tracking error between target and actual joint angles to motor torques, enabling compliant motion.
Figure 2: Overview of Contact-Grounded Policy (CGP). CGP grounds multi-point contacts by predicting coupled trajectories of actual robot state and tactile feedback, and using a learned contact-consistency mapping to convert these predictions into executable target robot states for a compliance controller.
Figure 3: Teleoperation pipeline. We use a Meta Quest 3 headset for VR-based hand tracking in simulation and an OptiTrack system for mocap-based hand tracking on the real robot. Despite different tracking front-ends, both settings share the same retargeting and controller-stack architecture.
Figure 4: Snapshots of Contact-Grounded Policy (CGP) rollouts on three simulated tasks, showing time-aligned predicted and observed tactile feedback. At each inference step, the diffusion model predicts the next 16 steps of tactile feedback and actual states, which are mapped to target states and executed for 8 steps before the next inference. Predicted tactile is time-aligned with subsequent observations after execution, and the close match indicates that CGP executes contact-grounded targets and realizes the predicted contact evolution. Full rollout videos are provided in the supplementary material.
Figure 5: Hand configuration predictions by the contact-consistency mapping for unseen grasps (Section \ref{['sec:hand_pred']}). This figure provides high-level evidence that contact can be consistently represented through this mapping in a way that generalizes across diverse contact configurations.
...and 6 more figures

Contact-Grounded Policy: Dexterous Visuotactile Policy with Generative Contact Grounding

TL;DR

Abstract

Contact-Grounded Policy: Dexterous Visuotactile Policy with Generative Contact Grounding

Authors

TL;DR

Abstract

Table of Contents

Figures (11)