Less is More: Efficient Point Cloud Reconstruction via Multi-Head Decoders
Pedro Alonso, Tianrui Li, Chongshou Li
TL;DR
This paper challenges the assumption that deeper decoders invariably improve point cloud reconstruction by showing that excessive depth can cause overfitting and poor generalization. It introduces a multi-head decoder that reconstructs complete shapes via multiple independent heads operating on disjoint point subsets, whose outputs are concatenated to form the final shape. Across ModelNet40 and ShapeNetPart, and using three backbones, the multi-head design consistently outperforms single-head decoders on CD, EMD, HD, and F1, with EMD gains being particularly notable. The work demonstrates that output diversity and architectural decomposition can surpass depth-driven capacity, offering a practical, adaptable improvement for 3D reconstruction tasks.
Abstract
We challenge the common assumption that deeper decoder architectures always yield better performance in point cloud reconstruction. Our analysis reveals that, beyond a certain depth, increasing decoder complexity leads to overfitting and degraded generalization. Additionally, we propose a novel multi-head decoder architecture that exploits the inherent redundancy in point clouds by reconstructing complete shapes from multiple independent heads, each operating on a distinct subset of points. The final output is obtained by concatenating the predictions from all heads, enhancing both diversity and fidelity. Extensive experiments on ModelNet40 and ShapeNetPart demonstrate that our approach achieves consistent improvements across key metrics--including Chamfer Distance (CD), Hausdorff Distance (HD), Earth Mover's Distance (EMD), and F1-score--outperforming standard single-head baselines. Our findings highlight that output diversity and architectural design can be more critical than depth alone for effective and efficient point cloud reconstruction.
