Explicitly Guided Information Interaction Network for Cross-modal Point Cloud Completion
Hang Xu, Chen Long, Wenxiao Zhang, Yuan Liu, Zhen Cao, Zhen Dong, Bisheng Yang
TL;DR
EGIInet tackles cross-modal point cloud completion by explicitly guiding information interaction between a partial point cloud and a single-view image. It introduces a unified encoder to align modalities and a separable interaction pathway (SFTnet) supervised by a Gram-matrix based FT-Loss, followed by a simple cross-attention fusion and a XMFnet-like decoder to produce the completed shape. The FT-Loss comprises an Informational Loss and a Structural Loss, with Gram matrices $G(\mathbf{F}) = \mathbf{F}^T \bullet \mathbf{F}$ enabling explicit transfer of structural cues; this yields state-of-the-art results on ShapeNet-ViPC with fewer parameters (9.03M vs 9.57M) and a notable $16\%$ reduction in $l_2$-CD over the previous best. The work demonstrates that explicit guidance in multi-modal fusion improves reliability and accuracy of view-guided completion and suggests broader potential for explicit interaction in multi-modal tasks.
Abstract
In this paper, we explore a novel framework, EGIInet (Explicitly Guided Information Interaction Network), a model for View-guided Point cloud Completion (ViPC) task, which aims to restore a complete point cloud from a partial one with a single view image. In comparison with previous methods that relied on the global semantics of input images, EGIInet efficiently combines the information from two modalities by leveraging the geometric nature of the completion task. Specifically, we propose an explicitly guided information interaction strategy supported by modal alignment for point cloud completion. First, in contrast to previous methods which simply use 2D and 3D backbones to encode features respectively, we unified the encoding process to promote modal alignment. Second, we propose a novel explicitly guided information interaction strategy that could help the network identify critical information within images, thus achieving better guidance for completion. Extensive experiments demonstrate the effectiveness of our framework, and we achieved a new state-of-the-art (+16% CD over XMFnet) in benchmark datasets despite using fewer parameters than the previous methods. The pre-trained model and code and are available at https://github.com/WHU-USI3DV/EGIInet.
