GoLF-NRT: Integrating Global Context and Local Geometry for Few-Shot View Synthesis

You Wang; Li Fang; Hao Zhu; Fei Hu; Long Ye; Zhan Ma

GoLF-NRT: Integrating Global Context and Local Geometry for Few-Shot View Synthesis

You Wang, Li Fang, Hao Zhu, Fei Hu, Long Ye, Zhan Ma

TL;DR

GoLF-NRT tackles few-shot view synthesis by fusing global scene context with local geometric cues through a near-linear 3D transformer and an adaptive, kernel-regressed sampling strategy. The method first builds a coarse global representation $ oldsymbol{Z}_g $ to produce a ray-specific $ oldsymbol{F}_g $, then guides local epipolar-line feature aggregation with this global cue to form $ oldsymbol{F}_{g-l} $, which is finally decoded to color via an MLP. Across LLFF, Blender, and Shiny datasets, GoLF-NRT achieves state-of-the-art performance for 1–3 input views and remains competitive in 10-view settings, with notable robustness to reflective and occluded regions. The combination of global-context guidance and adaptive local sampling reduces depth ambiguities and artifacts, enabling high-fidelity, view-consistent renderings suitable for real-world deployment.

Abstract

Neural Radiance Fields (NeRF) have transformed novel view synthesis by modeling scene-specific volumetric representations directly from images. While generalizable NeRF models can generate novel views across unknown scenes by learning latent ray representations, their performance heavily depends on a large number of multi-view observations. However, with limited input views, these methods experience significant degradation in rendering quality. To address this limitation, we propose GoLF-NRT: a Global and Local feature Fusion-based Neural Rendering Transformer. GoLF-NRT enhances generalizable neural rendering from few input views by leveraging a 3D transformer with efficient sparse attention to capture global scene context. In parallel, it integrates local geometric features extracted along the epipolar line, enabling high-quality scene reconstruction from as few as 1 to 3 input views. Furthermore, we introduce an adaptive sampling strategy based on attention weights and kernel regression, improving the accuracy of transformer-based neural rendering. Extensive experiments on public datasets show that GoLF-NRT achieves state-of-the-art performance across varying numbers of input views, highlighting the effectiveness and superiority of our approach. Code is available at https://github.com/KLMAV-CUC/GoLF-NRT.

GoLF-NRT: Integrating Global Context and Local Geometry for Few-Shot View Synthesis

TL;DR

Abstract

GoLF-NRT: Integrating Global Context and Local Geometry for Few-Shot View Synthesis

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (12)