Hestia: Voxel-Face-Aware Hierarchical Next-Best-View Acquisition for Efficient 3D Reconstruction
Cheng-You Lu, Zhuoli Zhuang, Nguyen Thanh Trung Le, Da Xiao, Yu-Cheng Chang, Thomas Do, Srinath Sridhar, Chin-teng Lin
TL;DR
Hestia presents a voxel-face-aware, hierarchical NBV planner that treats voxels as cubes to better capture geometry during 5-DoF viewpoint prediction. By using a two-stage network (look-at point then gaze location) and a close-greedy reinforcement learning objective, it achieves real-time inference (≈25 FPS) with substantial gains in coverage and reconstruction accuracy across diverse object categories. The approach is trained on a large, diverse Objaverse-derived dataset and validated on OmniObject3D, Objaverse, and Houses3K, showing robust performance under translation and limited-view budgets, and it is demonstrated in real-world drone experiments. These results indicate strong practical potential for efficient, automated 3D reconstruction in object-centric scenes, with future work pointing toward multi-agent extensions and outdoor deployments.
Abstract
Advances in 3D reconstruction and novel view synthesis have enabled efficient and photorealistic rendering. However, images for reconstruction are still either largely manual or constrained by simple preplanned trajectories. To address this issue, recent works propose generalizable next-best-view planners that do not require online learning. Nevertheless, robustness and performance remain limited across various shapes. Hence, this study introduces Voxel-Face-Aware Hierarchical Next-Best-View Acquisition for Efficient 3D Reconstruction (Hestia), which addresses the shortcomings of the reinforcement learning-based generalizable approaches for five-degree-of-freedom viewpoint prediction. Hestia systematically improves the planners through four components: a more diverse dataset to promote robustness, a hierarchical structure to manage the high-dimensional continuous action search space, a close-greedy strategy to mitigate spurious correlations, and a face-aware design to avoid overlooking geometry. Experimental results show that Hestia achieves non-marginal improvements, with at least a 4% gain in coverage ratio, while reducing Chamfer Distance by 50% and maintaining real-time inference. In addition, Hestia outperforms prior methods by at least 12% in coverage ratio with a 5-image budget and remains robust to object placement variations. Finally, we demonstrate that Hestia, as a next-best-view planner, is feasible for the real-world application. Our project page is https://johnnylu305.github.io/hestia web.
