MARS: Multimodal Active Robotic Sensing for Articulated Characterization

Hongliang Zeng; Ping Zhang; Chengjiong Wu; Jiahua Wang; Tingyu Ye; Fang Li

MARS: Multimodal Active Robotic Sensing for Articulated Characterization

Hongliang Zeng, Ping Zhang, Chengjiong Wu, Jiahua Wang, Tingyu Ye, Fang Li

TL;DR

MARS addresses robust articulation parameter estimation for articulated objects by integrating multimodal RGB and point-cloud data with a reinforcement learning based active sensing strategy. It introduces Multimodal Feature Fusion Perception (MFFP) and Active Sensing (AS), augmented by MLDM for adaptive RGB feature aggregation and a transformer-based fusion block. The system predicts joint parameters for revolute and prismatic joints and provides a perception score to gauge viewpoint quality; active sensing improves viewpoint selection and estimation accuracy, with real-world experiments showing practical applicability. Overall, MARS achieves state-of-the-art joint parameter estimation on PartNet-Mobility, demonstrating improved robustness under suboptimal viewpoints and enabling effective command-based manipulation in real-world scenarios.

Abstract

Precise perception of articulated objects is vital for empowering service robots. Recent studies mainly focus on point cloud, a single-modal approach, often neglecting vital texture and lighting details and assuming ideal conditions like optimal viewpoints, unrepresentative of real-world scenarios. To address these limitations, we introduce MARS, a novel framework for articulated object characterization. It features a multi-modal fusion module utilizing multi-scale RGB features to enhance point cloud features, coupled with reinforcement learning-based active sensing for autonomous optimization of observation viewpoints. In experiments conducted with various articulated object instances from the PartNet-Mobility dataset, our method outperformed current state-of-the-art methods in joint parameter estimation accuracy. Additionally, through active sensing, MARS further reduces errors, demonstrating enhanced efficiency in handling suboptimal viewpoints. Furthermore, our method effectively generalizes to real-world articulated objects, enhancing robot interactions. Code is available at https://github.com/robhlzeng/MARS.

MARS: Multimodal Active Robotic Sensing for Articulated Characterization

TL;DR

Abstract

Paper Structure (32 sections, 13 equations, 7 figures, 2 tables)

This paper contains 32 sections, 13 equations, 7 figures, 2 tables.

Introduction
Related Works
Articulated Object Characterization.
Multimodal Feature Fusion.
Active Sensing.
Method
MLDM
Feature Fusion Block
Articulation Decoders
Joint Parameters.
Perception Score.
Loss Functions.
Training Steps.
The RL Policy For Active Sensing
State Space.
...and 17 more sections

Figures (7)

Figure 1: MARS uses active sensing to find optimal viewpoints for observing articulated objects, predicting precise joint parameters from RGB and point cloud for command-based robot planning.
Figure 2: MARS Framework with MFFP and AS components. MFFP integrates RGB and point cloud data, utilizing MLDM for adaptive RGB feature scaling (see Fig. \ref{['MLDM']}) and a Fusion Block for combining features, aiding in joint parameters prediction. AS adjusts viewpoints under suboptimal conditions, enhancing perception accuracy in real-world scenarios.
Figure 3: MLDM Architecture for RGB Image Feature Aggregation. (a) Image feature maps $f^i_{r}$ are extracted at multiple scales from ResNet blocks. (b) These feature maps are first combined with point cloud feature $f^j_p$, upon which adaptive weights $w_i$ are computed to form the final weighted image feature representation $f_r$.
Figure 4: (a) Sixteen discrete positions constitute the discrete action space for viewpoint selection. (b) Comparative analysis of point cloud quality obtained from different viewpoints.
Figure 5: Comparison of point cloud-level manipulation visualizations, where blue dots represent selected parts and $\mathcal{C}$ denotes the current manipulation command.
...and 2 more figures

MARS: Multimodal Active Robotic Sensing for Articulated Characterization

TL;DR

Abstract

MARS: Multimodal Active Robotic Sensing for Articulated Characterization

Authors

TL;DR

Abstract

Table of Contents

Figures (7)