IAAO: Interactive Affordance Learning for Articulated Objects in 3D Environments
Can Zhang, Gim Hee Lee
TL;DR
IAAO tackles interactive affordance learning for articulated objects in 3D environments by building an explicit 3D Gaussian Splatting representation augmented with hierarchical semantic features from foundation models. It combines semantic scene reconstruction, language-guided affordance localization, and global/local motion estimation with robust 2D-3D correspondences, followed by scene state fusion to integrate two articulated configurations. The method achieves state-of-the-art performance on PARIS and multi-part benchmarks, with strong generalization to unseen objects and complex indoor scenes, while supporting manipulation through affordance-aware queries. This approach enables robust interaction and manipulation in real-world environments without relying on category-specific priors or perfectly aligned camera poses, significantly advancing interactive perception for robots and AR/VR agents.
Abstract
This work presents IAAO, a novel framework that builds an explicit 3D model for intelligent agents to gain understanding of articulated objects in their environment through interaction. Unlike prior methods that rely on task-specific networks and assumptions about movable parts, our IAAO leverages large foundation models to estimate interactive affordances and part articulations in three stages. We first build hierarchical features and label fields for each object state using 3D Gaussian Splatting (3DGS) by distilling mask features and view-consistent labels from multi-view images. We then perform object- and part-level queries on the 3D Gaussian primitives to identify static and articulated elements, estimating global transformations and local articulation parameters along with affordances. Finally, scenes from different states are merged and refined based on the estimated transformations, enabling robust affordance-based interaction and manipulation of objects. Experimental results demonstrate the effectiveness of our method.
