GEAL: Generalizable 3D Affordance Learning with Cross-Modal Consistency
Dongyue Lu, Lingdong Kong, Tianxin Huang, Gim Hee Lee
TL;DR
GEAL addresses the limited generalization and robustness of 3D affordance learning by bridging 3D point clouds and rich 2D semantics through Gaussian Splatting, creating a 2D renderings branch from 3D data. A granularity-adaptive fusion and a 2D-3D consistency alignment module enable cross-modal knowledge transfer, allowing the 3D branch to inherit robust semantics from large-scale 2D foundation models. The paper introduces PIAD-C and LASO-C to holistically evaluate robustness under real-world corruptions, and demonstrates that GEAL outperforms state-of-the-art methods on seen/unseen object categories and under corrupt data. These results suggest a practical pathway to more reliable, cross-modal 3D affordance reasoning for robotics and human-machine interaction.
Abstract
Identifying affordance regions on 3D objects from semantic cues is essential for robotics and human-machine interaction. However, existing 3D affordance learning methods struggle with generalization and robustness due to limited annotated data and a reliance on 3D backbones focused on geometric encoding, which often lack resilience to real-world noise and data corruption. We propose GEAL, a novel framework designed to enhance the generalization and robustness of 3D affordance learning by leveraging large-scale pre-trained 2D models. We employ a dual-branch architecture with Gaussian splatting to establish consistent mappings between 3D point clouds and 2D representations, enabling realistic 2D renderings from sparse point clouds. A granularity-adaptive fusion module and a 2D-3D consistency alignment module further strengthen cross-modal alignment and knowledge transfer, allowing the 3D branch to benefit from the rich semantics and generalization capacity of 2D models. To holistically assess the robustness, we introduce two new corruption-based benchmarks: PIAD-C and LASO-C. Extensive experiments on public datasets and our benchmarks show that GEAL consistently outperforms existing methods across seen and novel object categories, as well as corrupted data, demonstrating robust and adaptable affordance prediction under diverse conditions. Code and corruption datasets have been made publicly available.
