Beyond Viewpoint: Robust 3D Object Recognition under Arbitrary Views through Joint Multi-Part Representation
Linlong Fan, Ye Huang, Yanqi Ge, Wen Li, Lixin Duan
TL;DR
The paper tackles 3D object recognition under arbitrary views where object poses and the number of viewpoints vary and inputs are unaligned. It proposes PANet, a part-based representation that localizes discriminative parts in each view via weakly supervised cues, refines cross-view part information with an Adaptive Part Refinement transformer, and combines multiple global parts into a robust object descriptor. The approach introduces a cross-view association mechanism and part-aware loss to ensure diverse, informative part features, achieving state-of-the-art performance on ScanObjectNN, ModelNet, and RGBD, especially in arbitrary-view settings. These results demonstrate the practical impact of part-level, view-robust representations for real-world 3D recognition tasks and offer improved interpretability through part-level reasoning.
Abstract
Existing view-based methods excel at recognizing 3D objects from predefined viewpoints, but their exploration of recognition under arbitrary views is limited. This is a challenging and realistic setting because each object has different viewpoint positions and quantities, and their poses are not aligned. However, most view-based methods, which aggregate multiple view features to obtain a global feature representation, hard to address 3D object recognition under arbitrary views. Due to the unaligned inputs from arbitrary views, it is challenging to robustly aggregate features, leading to performance degradation. In this paper, we introduce a novel Part-aware Network (PANet), which is a part-based representation, to address these issues. This part-based representation aims to localize and understand different parts of 3D objects, such as airplane wings and tails. It has properties such as viewpoint invariance and rotation robustness, which give it an advantage in addressing the 3D object recognition problem under arbitrary views. Our results on benchmark datasets clearly demonstrate that our proposed method outperforms existing view-based aggregation baselines for the task of 3D object recognition under arbitrary views, even surpassing most fixed viewpoint methods.
