Table of Contents
Fetching ...

Less is More: Efficient Point Cloud Reconstruction via Multi-Head Decoders

Pedro Alonso, Tianrui Li, Chongshou Li

TL;DR

This paper challenges the assumption that deeper decoders invariably improve point cloud reconstruction by showing that excessive depth can cause overfitting and poor generalization. It introduces a multi-head decoder that reconstructs complete shapes via multiple independent heads operating on disjoint point subsets, whose outputs are concatenated to form the final shape. Across ModelNet40 and ShapeNetPart, and using three backbones, the multi-head design consistently outperforms single-head decoders on CD, EMD, HD, and F1, with EMD gains being particularly notable. The work demonstrates that output diversity and architectural decomposition can surpass depth-driven capacity, offering a practical, adaptable improvement for 3D reconstruction tasks.

Abstract

We challenge the common assumption that deeper decoder architectures always yield better performance in point cloud reconstruction. Our analysis reveals that, beyond a certain depth, increasing decoder complexity leads to overfitting and degraded generalization. Additionally, we propose a novel multi-head decoder architecture that exploits the inherent redundancy in point clouds by reconstructing complete shapes from multiple independent heads, each operating on a distinct subset of points. The final output is obtained by concatenating the predictions from all heads, enhancing both diversity and fidelity. Extensive experiments on ModelNet40 and ShapeNetPart demonstrate that our approach achieves consistent improvements across key metrics--including Chamfer Distance (CD), Hausdorff Distance (HD), Earth Mover's Distance (EMD), and F1-score--outperforming standard single-head baselines. Our findings highlight that output diversity and architectural design can be more critical than depth alone for effective and efficient point cloud reconstruction.

Less is More: Efficient Point Cloud Reconstruction via Multi-Head Decoders

TL;DR

This paper challenges the assumption that deeper decoders invariably improve point cloud reconstruction by showing that excessive depth can cause overfitting and poor generalization. It introduces a multi-head decoder that reconstructs complete shapes via multiple independent heads operating on disjoint point subsets, whose outputs are concatenated to form the final shape. Across ModelNet40 and ShapeNetPart, and using three backbones, the multi-head design consistently outperforms single-head decoders on CD, EMD, HD, and F1, with EMD gains being particularly notable. The work demonstrates that output diversity and architectural decomposition can surpass depth-driven capacity, offering a practical, adaptable improvement for 3D reconstruction tasks.

Abstract

We challenge the common assumption that deeper decoder architectures always yield better performance in point cloud reconstruction. Our analysis reveals that, beyond a certain depth, increasing decoder complexity leads to overfitting and degraded generalization. Additionally, we propose a novel multi-head decoder architecture that exploits the inherent redundancy in point clouds by reconstructing complete shapes from multiple independent heads, each operating on a distinct subset of points. The final output is obtained by concatenating the predictions from all heads, enhancing both diversity and fidelity. Extensive experiments on ModelNet40 and ShapeNetPart demonstrate that our approach achieves consistent improvements across key metrics--including Chamfer Distance (CD), Hausdorff Distance (HD), Earth Mover's Distance (EMD), and F1-score--outperforming standard single-head baselines. Our findings highlight that output diversity and architectural design can be more critical than depth alone for effective and efficient point cloud reconstruction.

Paper Structure

This paper contains 13 sections, 5 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Comparison of single-head and multi-head decoder architectures. The top row shows the standard encoder–decoder pipeline, where a single-head decoder reconstructs the entire point cloud from the latent representation. The bottom row illustrates our multi-head design, in which M parallel decoders each generate a subset of points, which are then concatenated to form the final reconstructed point cloud.
  • Figure 2: Reconstruction metrics as a function of decoder size for three backbone models; Light-AE (top row), Deep-AE (middle row), and PTv3 (bottom row). Within each panel, the x-axis shows the decoder depth, comparing single-head and multi-head configurations across two datasets: ModelNet40 and ShapeNetPart.