Table of Contents
Fetching ...

MARS: Multimodal Active Robotic Sensing for Articulated Characterization

Hongliang Zeng, Ping Zhang, Chengjiong Wu, Jiahua Wang, Tingyu Ye, Fang Li

TL;DR

MARS addresses robust articulation parameter estimation for articulated objects by integrating multimodal RGB and point-cloud data with a reinforcement learning based active sensing strategy. It introduces Multimodal Feature Fusion Perception (MFFP) and Active Sensing (AS), augmented by MLDM for adaptive RGB feature aggregation and a transformer-based fusion block. The system predicts joint parameters for revolute and prismatic joints and provides a perception score to gauge viewpoint quality; active sensing improves viewpoint selection and estimation accuracy, with real-world experiments showing practical applicability. Overall, MARS achieves state-of-the-art joint parameter estimation on PartNet-Mobility, demonstrating improved robustness under suboptimal viewpoints and enabling effective command-based manipulation in real-world scenarios.

Abstract

Precise perception of articulated objects is vital for empowering service robots. Recent studies mainly focus on point cloud, a single-modal approach, often neglecting vital texture and lighting details and assuming ideal conditions like optimal viewpoints, unrepresentative of real-world scenarios. To address these limitations, we introduce MARS, a novel framework for articulated object characterization. It features a multi-modal fusion module utilizing multi-scale RGB features to enhance point cloud features, coupled with reinforcement learning-based active sensing for autonomous optimization of observation viewpoints. In experiments conducted with various articulated object instances from the PartNet-Mobility dataset, our method outperformed current state-of-the-art methods in joint parameter estimation accuracy. Additionally, through active sensing, MARS further reduces errors, demonstrating enhanced efficiency in handling suboptimal viewpoints. Furthermore, our method effectively generalizes to real-world articulated objects, enhancing robot interactions. Code is available at https://github.com/robhlzeng/MARS.

MARS: Multimodal Active Robotic Sensing for Articulated Characterization

TL;DR

MARS addresses robust articulation parameter estimation for articulated objects by integrating multimodal RGB and point-cloud data with a reinforcement learning based active sensing strategy. It introduces Multimodal Feature Fusion Perception (MFFP) and Active Sensing (AS), augmented by MLDM for adaptive RGB feature aggregation and a transformer-based fusion block. The system predicts joint parameters for revolute and prismatic joints and provides a perception score to gauge viewpoint quality; active sensing improves viewpoint selection and estimation accuracy, with real-world experiments showing practical applicability. Overall, MARS achieves state-of-the-art joint parameter estimation on PartNet-Mobility, demonstrating improved robustness under suboptimal viewpoints and enabling effective command-based manipulation in real-world scenarios.

Abstract

Precise perception of articulated objects is vital for empowering service robots. Recent studies mainly focus on point cloud, a single-modal approach, often neglecting vital texture and lighting details and assuming ideal conditions like optimal viewpoints, unrepresentative of real-world scenarios. To address these limitations, we introduce MARS, a novel framework for articulated object characterization. It features a multi-modal fusion module utilizing multi-scale RGB features to enhance point cloud features, coupled with reinforcement learning-based active sensing for autonomous optimization of observation viewpoints. In experiments conducted with various articulated object instances from the PartNet-Mobility dataset, our method outperformed current state-of-the-art methods in joint parameter estimation accuracy. Additionally, through active sensing, MARS further reduces errors, demonstrating enhanced efficiency in handling suboptimal viewpoints. Furthermore, our method effectively generalizes to real-world articulated objects, enhancing robot interactions. Code is available at https://github.com/robhlzeng/MARS.
Paper Structure (32 sections, 13 equations, 7 figures, 2 tables)

This paper contains 32 sections, 13 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: MARS uses active sensing to find optimal viewpoints for observing articulated objects, predicting precise joint parameters from RGB and point cloud for command-based robot planning.
  • Figure 2: MARS Framework with MFFP and AS components. MFFP integrates RGB and point cloud data, utilizing MLDM for adaptive RGB feature scaling (see Fig. \ref{['MLDM']}) and a Fusion Block for combining features, aiding in joint parameters prediction. AS adjusts viewpoints under suboptimal conditions, enhancing perception accuracy in real-world scenarios.
  • Figure 3: MLDM Architecture for RGB Image Feature Aggregation. (a) Image feature maps $f^i_{r}$ are extracted at multiple scales from ResNet blocks. (b) These feature maps are first combined with point cloud feature $f^j_p$, upon which adaptive weights $w_i$ are computed to form the final weighted image feature representation $f_r$.
  • Figure 4: (a) Sixteen discrete positions constitute the discrete action space for viewpoint selection. (b) Comparative analysis of point cloud quality obtained from different viewpoints.
  • Figure 5: Comparison of point cloud-level manipulation visualizations, where blue dots represent selected parts and $\mathcal{C}$ denotes the current manipulation command.
  • ...and 2 more figures