Table of Contents
Fetching ...

AREA3D: Active Reconstruction Agent with Unified Feed-Forward 3D Perception and Vision-Language Guidance

Tianling Xu, Shengzhe Gan, Leslie Gu, Yuelei Li, Fangneng Zhan, Hanspeter Pfister

TL;DR

AREA3D tackles active 3D reconstruction under tight view budgets by integrating a feed-forward 3D perception model with a vision-language guided semantic module. It decouples geometric uncertainty from the reconstructor and fuses both signals into a unified uncertainty field to guide view planning, avoiding online optimization. The method demonstrates state-of-the-art reconstruction accuracy, especially in sparse-view regimes, across object-level and scene-level benchmarks, with ablations confirming complementary contributions. Code release is planned.

Abstract

Active 3D reconstruction enables an agent to autonomously select viewpoints to efficiently obtain accurate and complete scene geometry, rather than passively reconstructing scenes from pre-collected images. However, existing active reconstruction methods often rely on hand-crafted geometric heuristics, which can lead to redundant observations without substantially improving reconstruction quality. To address this limitation, we propose AREA3D, an active reconstruction agent that leverages feed-forward 3D reconstruction models and vision-language guidance. Our framework decouples view-uncertainty modeling from the underlying feed-forward reconstructor, enabling precise uncertainty estimation without expensive online optimization. In addition, an integrated vision-language model provides high-level semantic guidance, encouraging informative and diverse viewpoints beyond purely geometric cues. Extensive experiments on both scene-level and object-level benchmarks demonstrate that AREA3D achieves state-of-the-art reconstruction accuracy, particularly in the sparse-view regime. Code will be made available at: https://github.com/TianlingXu/AREA3D .

AREA3D: Active Reconstruction Agent with Unified Feed-Forward 3D Perception and Vision-Language Guidance

TL;DR

AREA3D tackles active 3D reconstruction under tight view budgets by integrating a feed-forward 3D perception model with a vision-language guided semantic module. It decouples geometric uncertainty from the reconstructor and fuses both signals into a unified uncertainty field to guide view planning, avoiding online optimization. The method demonstrates state-of-the-art reconstruction accuracy, especially in sparse-view regimes, across object-level and scene-level benchmarks, with ablations confirming complementary contributions. Code release is planned.

Abstract

Active 3D reconstruction enables an agent to autonomously select viewpoints to efficiently obtain accurate and complete scene geometry, rather than passively reconstructing scenes from pre-collected images. However, existing active reconstruction methods often rely on hand-crafted geometric heuristics, which can lead to redundant observations without substantially improving reconstruction quality. To address this limitation, we propose AREA3D, an active reconstruction agent that leverages feed-forward 3D reconstruction models and vision-language guidance. Our framework decouples view-uncertainty modeling from the underlying feed-forward reconstructor, enabling precise uncertainty estimation without expensive online optimization. In addition, an integrated vision-language model provides high-level semantic guidance, encouraging informative and diverse viewpoints beyond purely geometric cues. Extensive experiments on both scene-level and object-level benchmarks demonstrate that AREA3D achieves state-of-the-art reconstruction accuracy, particularly in the sparse-view regime. Code will be made available at: https://github.com/TianlingXu/AREA3D .

Paper Structure

This paper contains 22 sections, 11 equations, 8 figures, 7 tables, 1 algorithm.

Figures (8)

  • Figure 1: Overview of our approach. We propose AREA3D, an active reconstruction agent, which unifies two complementary signals of feed-forward 3D perception and vision-language guidance to decide the next best views under tight view budgets. AREA3D efficiently reconstructs high-fidelity geometry from sparse observations by actively choosing the most informative viewpoints.
  • Figure 2: Overview of the AREA3D pipeline. The framework integrates feed-forward 3D perception and vision-language guidance to actively select informative viewpoints and to reconstruct high-fidelity geometry via Gaussian Splatting, even under sparse observations.
  • Figure 3: PSNR as the number of input frames increases under different view-selection policies in the scene-level setting..
  • Figure 4: PSNR as the number of input frames increases under different view-selection policies in the object-level setting.
  • Figure 5: PSNR comparison as frames increase in scene-level.
  • ...and 3 more figures