Table of Contents
Fetching ...

DuInNet: Dual-Modality Feature Interaction for Point Cloud Completion

Xinpu Liu, Baolin Hou, Hanyun Wang, Ke Xu, Jianwei Wan, Yulan Guo

TL;DR

DuInNet introduces a dual-modality feature interaction framework for point cloud completion, enabling iterative cross-attention between partial point clouds and corresponding images to jointly learn geometric and texture priors. The architecture comprises separate encoders, a Dual Feature Interactor, and an Adaptive Point Generator that outputs complete point clouds in modality-weighted blocks, trained with Chamfer-based losses. To advance multimodal evaluation, the paper presents ModelNet-MPC, a large-scale benchmark with 400k paired point clouds and images across 40 categories, plus denoising and zero-shot completion tasks. Across ShapeNet-ViPC and ModelNet-MPC, DuInNet outperforms state-of-the-art methods, showing strong robustness to noise, better category generalization, and notable transfer ability to unseen categories, highlighting the practical impact for real-world multimodal 3D reconstruction. The work provides a concrete dataset and rigorous ablations to validate the efficacy of dual-path interaction and adaptive block-wise generation in multimodal point cloud completion.

Abstract

To further promote the development of multimodal point cloud completion, we contribute a large-scale multimodal point cloud completion benchmark ModelNet-MPC with richer shape categories and more diverse test data, which contains nearly 400,000 pairs of high-quality point clouds and rendered images of 40 categories. Besides the fully supervised point cloud completion task, two additional tasks including denoising completion and zero-shot learning completion are proposed in ModelNet-MPC, to simulate real-world scenarios and verify the robustness to noise and the transfer ability across categories of current methods. Meanwhile, considering that existing multimodal completion pipelines usually adopt a unidirectional fusion mechanism and ignore the shape prior contained in the image modality, we propose a Dual-Modality Feature Interaction Network (DuInNet) in this paper. DuInNet iteratively interacts features between point clouds and images to learn both geometric and texture characteristics of shapes with the dual feature interactor. To adapt to specific tasks such as fully supervised, denoising, and zero-shot learning point cloud completions, an adaptive point generator is proposed to generate complete point clouds in blocks with different weights for these two modalities. Extensive experiments on the ShapeNet-ViPC and ModelNet-MPC benchmarks demonstrate that DuInNet exhibits superiority, robustness and transfer ability in all completion tasks over state-of-the-art methods. The code and dataset will be available soon.

DuInNet: Dual-Modality Feature Interaction for Point Cloud Completion

TL;DR

DuInNet introduces a dual-modality feature interaction framework for point cloud completion, enabling iterative cross-attention between partial point clouds and corresponding images to jointly learn geometric and texture priors. The architecture comprises separate encoders, a Dual Feature Interactor, and an Adaptive Point Generator that outputs complete point clouds in modality-weighted blocks, trained with Chamfer-based losses. To advance multimodal evaluation, the paper presents ModelNet-MPC, a large-scale benchmark with 400k paired point clouds and images across 40 categories, plus denoising and zero-shot completion tasks. Across ShapeNet-ViPC and ModelNet-MPC, DuInNet outperforms state-of-the-art methods, showing strong robustness to noise, better category generalization, and notable transfer ability to unseen categories, highlighting the practical impact for real-world multimodal 3D reconstruction. The work provides a concrete dataset and rigorous ablations to validate the efficacy of dual-path interaction and adaptive block-wise generation in multimodal point cloud completion.

Abstract

To further promote the development of multimodal point cloud completion, we contribute a large-scale multimodal point cloud completion benchmark ModelNet-MPC with richer shape categories and more diverse test data, which contains nearly 400,000 pairs of high-quality point clouds and rendered images of 40 categories. Besides the fully supervised point cloud completion task, two additional tasks including denoising completion and zero-shot learning completion are proposed in ModelNet-MPC, to simulate real-world scenarios and verify the robustness to noise and the transfer ability across categories of current methods. Meanwhile, considering that existing multimodal completion pipelines usually adopt a unidirectional fusion mechanism and ignore the shape prior contained in the image modality, we propose a Dual-Modality Feature Interaction Network (DuInNet) in this paper. DuInNet iteratively interacts features between point clouds and images to learn both geometric and texture characteristics of shapes with the dual feature interactor. To adapt to specific tasks such as fully supervised, denoising, and zero-shot learning point cloud completions, an adaptive point generator is proposed to generate complete point clouds in blocks with different weights for these two modalities. Extensive experiments on the ShapeNet-ViPC and ModelNet-MPC benchmarks demonstrate that DuInNet exhibits superiority, robustness and transfer ability in all completion tasks over state-of-the-art methods. The code and dataset will be available soon.
Paper Structure (41 sections, 13 equations, 7 figures, 7 tables)

This paper contains 41 sections, 13 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Schematic completion strategy comparison.(a) Restoring complete shapes through autoencoders of partial point clouds. (b) Images-assisted point cloud completion by unidirectional fusion. (c) Our image and point cloud dual interaction strategy for shapes completion. Here, $F$ represents a feature matrix, $Q$, $K$ and $V$ represent query, key and value tensors, respectively.
  • Figure 2: The overall architecture of DuInNet. DuInNet adopts a dual-path autoencoder structure, takes partial point clouds and their corresponding 2D images as inputs, to generate complete point clouds. (b) and (c) take the point cloud path for example to illustrate the architectures. Here, FPS, BN and Trans represent farthest point sampling, batch normalization, and transpose, respectively.
  • Figure 3: Qualitative comparisons of different point cloud sampling strategies. (a) a CAD model of the person category. (b), (c) and (d) are corresponding point clouds rendered by randomly sampling, uniformly sampling, and poisson disk sampling strategies, respectively.
  • Figure 4: Our ModelNet-based Multimodal Point cloud Completion dataset (ModelNet-MPC). (a) shows 32 uniformly distributed viewpoints on a unit sphere. (b) and (c) represent the 32 view-aligned rendered partial point clouds and images for the airplane CAD from left uniformly distributed viewpoints, respectively.
  • Figure 5: Qualitative comparisons to the state-of-the-art methods on all eight categories of the ShapeNet-ViPC dataset.
  • ...and 2 more figures