Table of Contents
Fetching ...

On the Importance of Tactile Sensing for Imitation Learning: A Case Study on Robotic Match Lighting

Niklas Funk, Changqi Chen, Tim Schneider, Georgia Chalvatzaki, Roberto Calandra, Jan Peters

TL;DR

This paper addresses how tactile sensing enhances imitation learning for dynamic, contact-rich manipulation, using match lighting as a challenging case study. It introduces a multimodal visuotactile framework that combines a modular transformer with a conditional SE(3) flow-matching policy, trained from only 20 demonstrations. Experiments show that incorporating tactile feedback substantially improves policy success and reduces contact-related failures, and that a masked training strategy lets vision-only policies benefit from tactile information during training. The results demonstrate robust generalization to novel poses, objects, and lighting, underscoring the practical value of tactile sensing for fast, dexterous manipulation.

Abstract

The field of robotic manipulation has advanced significantly in recent years. At the sensing level, several novel tactile sensors have been developed, capable of providing accurate contact information. On a methodological level, learning from demonstrations has proven an efficient paradigm to obtain performant robotic manipulation policies. The combination of both holds the promise to extract crucial contact-related information from the demonstration data and actively exploit it during policy rollouts. However, this integration has so far been underexplored, most notably in dynamic, contact-rich manipulation tasks where precision and reactivity are essential. This work therefore proposes a multimodal, visuotactile imitation learning framework that integrates a modular transformer architecture with a flow-based generative model, enabling efficient learning of fast and dexterous manipulation policies. We evaluate our framework on the dynamic, contact-rich task of robotic match lighting - a task in which tactile feedback influences human manipulation performance. The experimental results highlight the effectiveness of our approach and show that adding tactile information improves policy performance, thereby underlining their combined potential for learning dynamic manipulation from few demonstrations. Project website: https://sites.google.com/view/tactile-il .

On the Importance of Tactile Sensing for Imitation Learning: A Case Study on Robotic Match Lighting

TL;DR

This paper addresses how tactile sensing enhances imitation learning for dynamic, contact-rich manipulation, using match lighting as a challenging case study. It introduces a multimodal visuotactile framework that combines a modular transformer with a conditional SE(3) flow-matching policy, trained from only 20 demonstrations. Experiments show that incorporating tactile feedback substantially improves policy success and reduces contact-related failures, and that a masked training strategy lets vision-only policies benefit from tactile information during training. The results demonstrate robust generalization to novel poses, objects, and lighting, underscoring the practical value of tactile sensing for fast, dexterous manipulation.

Abstract

The field of robotic manipulation has advanced significantly in recent years. At the sensing level, several novel tactile sensors have been developed, capable of providing accurate contact information. On a methodological level, learning from demonstrations has proven an efficient paradigm to obtain performant robotic manipulation policies. The combination of both holds the promise to extract crucial contact-related information from the demonstration data and actively exploit it during policy rollouts. However, this integration has so far been underexplored, most notably in dynamic, contact-rich manipulation tasks where precision and reactivity are essential. This work therefore proposes a multimodal, visuotactile imitation learning framework that integrates a modular transformer architecture with a flow-based generative model, enabling efficient learning of fast and dexterous manipulation policies. We evaluate our framework on the dynamic, contact-rich task of robotic match lighting - a task in which tactile feedback influences human manipulation performance. The experimental results highlight the effectiveness of our approach and show that adding tactile information improves policy performance, thereby underlining their combined potential for learning dynamic manipulation from few demonstrations. Project website: https://sites.google.com/view/tactile-il .

Paper Structure

This paper contains 13 sections, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Autonomous rollout of a policy that is conditioned on visual and tactile observations illustrated on the left. The policy controls the robot and, thereby, the contact configuration between the match and striker paper. As can be seen, the policy ensures sufficient force and velocity, resulting in successfully igniting the match. This work highlights the importance of tactile sensing for reliably solving the dynamic and delicate task of lighting up matches.
  • Figure 2: Method Overview. Upon retrieving the current observations, they are first encoded individually inside the observation encoder and brought into a common shape, i.e., each modality contributes a latent vector of a fixed shape. These latent vectors, together with the current action sequence & time index, then serve as the input to the transformer architecture, which outputs velocities to iteratively refine the action sequence through flow matching. Upon retrieving the final desired end effector trajectory, it is sent to the robot and tracked through a Cartesian Impedance Controller. Note that we only apply the first action to maintain reactivity.
  • Figure 3: Visualizing the versatility of the initial configurations during the experiments. Left: Fixed grasp pose strategy. Middle & Right: Two examples of the variable grasp initialization. Note how the initializations yield different configurations w.r.t. distance and angle between match and striker paper that the policies have to handle for solving the task.
  • Figure 4: Comparing the demonstrated trajectories with trajectories from rolling out different policies, considering the y-coordinate of the end effector. The y-coordinate is the direction along the striker paper in which the robot needs to accelerate to light up the matches. Qualitatively, the vision+touch policies generate rollouts that better match the demonstrations compared to the vision-only policies, indicating that the tactile observations contain important information for explaining and matching the demonstrations.
  • Figure 5: Comparing the success rates (mean and std deviation) of different policies on the variable grasp pose task. Across different observation encoding strategies, the vision+touch policies consistently outperform the vision-only policies by at least 50%, thereby highlighting the importance of tactile sensing for obtaining reliable match lighting policies. The vision+touch policies also outperform a touch-only baseline underlining that touch alone is insufficient for high success rates.
  • ...and 4 more figures