Table of Contents
Fetching ...

PanoAffordanceNet: Towards Holistic Affordance Grounding in 360° Indoor Environments

Guoliang Zhu, Wanjun Jia, Caoyang Shao, Yuheng Zhang, Zhiyong Li, Kailun Yang

TL;DR

PanoAffordanceNet is proposed, an end-to-end framework featuring a Distortion-Aware Spectral Modulator (DASM) for latitude-dependent calibration and an Omni-Spherical Densification Head (OSDH) to restore topological continuity from sparse activations and effectively suppresses semantic drift under low supervision.

Abstract

Global perception is essential for embodied agents in 360° spaces, yet current affordance grounding remains largely object-centric and restricted to perspective views. To bridge this gap, we introduce a novel task: Holistic Affordance Grounding in 360° Indoor Environments. This task faces unique challenges, including severe geometric distortions from Equirectangular Projection (ERP), semantic dispersion, and cross-scale alignment difficulties. We propose PanoAffordanceNet, an end-to-end framework featuring a Distortion-Aware Spectral Modulator (DASM) for latitude-dependent calibration and an Omni-Spherical Densification Head (OSDH) to restore topological continuity from sparse activations. By integrating multi-level constraints comprising pixel-wise, distributional, and region-text contrastive objectives, our framework effectively suppresses semantic drift under low supervision. Furthermore, we construct 360-AGD, the first high-quality panoramic affordance grounding dataset. Extensive experiments demonstrate that PanoAffordanceNet significantly outperforms existing methods, establishing a solid baseline for scene-level perception in embodied intelligence. The source code and benchmark dataset will be made publicly available at https://github.com/GL-ZHU925/PanoAffordanceNet.

PanoAffordanceNet: Towards Holistic Affordance Grounding in 360° Indoor Environments

TL;DR

PanoAffordanceNet is proposed, an end-to-end framework featuring a Distortion-Aware Spectral Modulator (DASM) for latitude-dependent calibration and an Omni-Spherical Densification Head (OSDH) to restore topological continuity from sparse activations and effectively suppresses semantic drift under low supervision.

Abstract

Global perception is essential for embodied agents in 360° spaces, yet current affordance grounding remains largely object-centric and restricted to perspective views. To bridge this gap, we introduce a novel task: Holistic Affordance Grounding in 360° Indoor Environments. This task faces unique challenges, including severe geometric distortions from Equirectangular Projection (ERP), semantic dispersion, and cross-scale alignment difficulties. We propose PanoAffordanceNet, an end-to-end framework featuring a Distortion-Aware Spectral Modulator (DASM) for latitude-dependent calibration and an Omni-Spherical Densification Head (OSDH) to restore topological continuity from sparse activations. By integrating multi-level constraints comprising pixel-wise, distributional, and region-text contrastive objectives, our framework effectively suppresses semantic drift under low supervision. Furthermore, we construct 360-AGD, the first high-quality panoramic affordance grounding dataset. Extensive experiments demonstrate that PanoAffordanceNet significantly outperforms existing methods, establishing a solid baseline for scene-level perception in embodied intelligence. The source code and benchmark dataset will be made publicly available at https://github.com/GL-ZHU925/PanoAffordanceNet.
Paper Structure (20 sections, 11 equations, 6 figures, 5 tables)

This paper contains 20 sections, 11 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Comparison of affordance grounding paradigms. Traditional object-centric methods (top) are restricted by a limited Field Of View (FOV). Our proposed holistic scene-level affordance grounding (bottom) with PanoAffordanceNet enables omnidirectional functional perception in 360° indoor environments.
  • Figure 2: Overview of PanoAffordanceNet. (a) Parameter-efficient dual-encoder framework with distortion-aware modulation and spherical densification. (b) Distortion-Aware Spectral Modulator (DASM) for latitude-adaptive frequency decomposition. (c) HFEM and (d) LFSM for interaction boundary sharpening and structural stabilization, respectively.
  • Figure 3: Architecture of the Omni-Spherical Densification Head (OSDH). Visual tokens undergo spherical projection to construct a cosine affinity matrix. Sparse initial activations are then densified via top-$k$ seed selection, confidence-guided noise suppression, and max propagation with a learnable residual scalar $\alpha$.
  • Figure 4: Properties of the 360-AGD dataset. (a) Representative examples from the dataset. (b) Word cloud of object categories. (c) Word cloud of affordance categories. (d) Statistical distribution of affordances across the Easy and Hard splits.
  • Figure 5: Qualitative comparison between the proposed PanoAffordanceNet and state-of-the-art one-shot affordance grounding methods, including OOAL li2024one and OS-AGDO jia2025one, on the established 360-AGD dataset.
  • ...and 1 more figures