Table of Contents
Fetching ...

Visuo-Tactile based Predictive Cross Modal Perception for Object Exploration in Robotics

Anirvan Dutta, Etienne Burdet, Mohsen Kaboli

TL;DR

This work tackles autonomous inference of object properties, notably mass, during visuo-tactile exploration in unstructured environments. It introduces a predictive cross-modal framework that uses an initial visual shape to form a prior over mass via a Cross-Modal Gaussian Process (CM-GP), and refines this estimate through interactive non-prehensile pushing with a dual filtering estimator. A surprise-driven training loop selectively adds informative shape–mass data to the CM-GP, achieving data-efficient lifelong learning. Real-robot experiments show improved mass estimation accuracy and faster convergence with reduced training data compared to non-cross-modal baselines. The approach enables robust, autonomous object exploration by coupling shape perception, cross-modal priors, and interactive feedback, facilitating scalable cross-modal learning in robotics.

Abstract

Autonomously exploring the unknown physical properties of novel objects such as stiffness, mass, center of mass, friction coefficient, and shape is crucial for autonomous robotic systems operating continuously in unstructured environments. We introduce a novel visuo-tactile based predictive cross-modal perception framework where initial visual observations (shape) aid in obtaining an initial prior over the object properties (mass). The initial prior improves the efficiency of the object property estimation, which is autonomously inferred via interactive non-prehensile pushing and using a dual filtering approach. The inferred properties are then used to enhance the predictive capability of the cross-modal function efficiently by using a human-inspired `surprise' formulation. We evaluated our proposed framework in the real-robotic scenario, demonstrating superior performance.

Visuo-Tactile based Predictive Cross Modal Perception for Object Exploration in Robotics

TL;DR

This work tackles autonomous inference of object properties, notably mass, during visuo-tactile exploration in unstructured environments. It introduces a predictive cross-modal framework that uses an initial visual shape to form a prior over mass via a Cross-Modal Gaussian Process (CM-GP), and refines this estimate through interactive non-prehensile pushing with a dual filtering estimator. A surprise-driven training loop selectively adds informative shape–mass data to the CM-GP, achieving data-efficient lifelong learning. Real-robot experiments show improved mass estimation accuracy and faster convergence with reduced training data compared to non-cross-modal baselines. The approach enables robust, autonomous object exploration by coupling shape perception, cross-modal priors, and interactive feedback, facilitating scalable cross-modal learning in robotics.

Abstract

Autonomously exploring the unknown physical properties of novel objects such as stiffness, mass, center of mass, friction coefficient, and shape is crucial for autonomous robotic systems operating continuously in unstructured environments. We introduce a novel visuo-tactile based predictive cross-modal perception framework where initial visual observations (shape) aid in obtaining an initial prior over the object properties (mass). The initial prior improves the efficiency of the object property estimation, which is autonomously inferred via interactive non-prehensile pushing and using a dual filtering approach. The inferred properties are then used to enhance the predictive capability of the cross-modal function efficiently by using a human-inspired `surprise' formulation. We evaluated our proposed framework in the real-robotic scenario, demonstrating superior performance.
Paper Structure (12 sections, 22 equations, 7 figures)

This paper contains 12 sections, 22 equations, 7 figures.

Figures (7)

  • Figure 1: Problem setup for visuo-tactile based predictive cross-modal perception for object exploration
  • Figure 2: Our proposed framework (a) for visuo-tactile based predictive cross-modal perception of object properties. The detailed block of the dual filtering approach in (b).
  • Figure 3: Experimental Object List. Object name followed by measured GT mass in kg
  • Figure 4: Qualitative shape estimation results on the distinct shapes
  • Figure 5: a) Mean squared error value of the predicted prior mass values w.r.t GT mass. The proposed $CM-GP$ (with surprise) utilises fewer data points, with a similar and consistent performance over without surprise model where all estimated shape and mass values are used for training. b) Evaluation of the estimated 'surprise' value during each object-robot interactions c) Efficiency of the proposed $CM-GP$ model in terms of $\%$ less data used at each iteration. A higher value signifies a more efficient Gaussian process regression model.
  • ...and 2 more figures