Table of Contents
Fetching ...

PixelGaussian: Generalizable 3D Gaussian Reconstruction from Arbitrary Views

Xin Fei, Wenzhao Zheng, Yueqi Duan, Wei Zhan, Masayoshi Tomizuka, Kurt Keutzer, Jiwen Lu

TL;DR

The PixelGaussian introduces a Cascade Gaussian Adapter to adjust Gaussian distribution according to local geometry complexity identified by a keypoint scorer, and designs a transformer-based Iterative Gaussian Refiner module that refines Gaussian representations through direct image-Gaussian interactions.

Abstract

We propose PixelGaussian, an efficient feed-forward framework for learning generalizable 3D Gaussian reconstruction from arbitrary views. Most existing methods rely on uniform pixel-wise Gaussian representations, which learn a fixed number of 3D Gaussians for each view and cannot generalize well to more input views. Differently, our PixelGaussian dynamically adapts both the Gaussian distribution and quantity based on geometric complexity, leading to more efficient representations and significant improvements in reconstruction quality. Specifically, we introduce a Cascade Gaussian Adapter to adjust Gaussian distribution according to local geometry complexity identified by a keypoint scorer. CGA leverages deformable attention in context-aware hypernetworks to guide Gaussian pruning and splitting, ensuring accurate representation in complex regions while reducing redundancy. Furthermore, we design a transformer-based Iterative Gaussian Refiner module that refines Gaussian representations through direct image-Gaussian interactions. Our PixelGaussian can effectively reduce Gaussian redundancy as input views increase. We conduct extensive experiments on the large-scale ACID and RealEstate10K datasets, where our method achieves state-of-the-art performance with good generalization to various numbers of views. Code: https://github.com/Barrybarry-Smith/PixelGaussian.

PixelGaussian: Generalizable 3D Gaussian Reconstruction from Arbitrary Views

TL;DR

The PixelGaussian introduces a Cascade Gaussian Adapter to adjust Gaussian distribution according to local geometry complexity identified by a keypoint scorer, and designs a transformer-based Iterative Gaussian Refiner module that refines Gaussian representations through direct image-Gaussian interactions.

Abstract

We propose PixelGaussian, an efficient feed-forward framework for learning generalizable 3D Gaussian reconstruction from arbitrary views. Most existing methods rely on uniform pixel-wise Gaussian representations, which learn a fixed number of 3D Gaussians for each view and cannot generalize well to more input views. Differently, our PixelGaussian dynamically adapts both the Gaussian distribution and quantity based on geometric complexity, leading to more efficient representations and significant improvements in reconstruction quality. Specifically, we introduce a Cascade Gaussian Adapter to adjust Gaussian distribution according to local geometry complexity identified by a keypoint scorer. CGA leverages deformable attention in context-aware hypernetworks to guide Gaussian pruning and splitting, ensuring accurate representation in complex regions while reducing redundancy. Furthermore, we design a transformer-based Iterative Gaussian Refiner module that refines Gaussian representations through direct image-Gaussian interactions. Our PixelGaussian can effectively reduce Gaussian redundancy as input views increase. We conduct extensive experiments on the large-scale ACID and RealEstate10K datasets, where our method achieves state-of-the-art performance with good generalization to various numbers of views. Code: https://github.com/Barrybarry-Smith/PixelGaussian.

Paper Structure

This paper contains 17 sections, 14 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Most existing generalizable 3D Gaussian splatting methods (e.g., pixelSplat charatan23pixelsplat, MVSplat chen2024mvsplat) assign a fixed number of Gaussians to each pixel, leading to inefficiency in capturing local geometry and overlap across views. Differently, our PixelGaussian dynamically adjusts the Gaussian distributions based on geometric complexity in a feed-forward framework. With comparable efficiency, PixelGaussian (trained using 2 views) successfully generalizes to various numbers of input views with adaptive Gaussian densities.
  • Figure 2: Overview of PixelGaussian. Given multi-view input images, we initialize 3D Gaussians using a lightweight image encoder and cost volume. Cascade Gaussian Adapter (CGA) then dynamically adapts both the distribution and quantity of Gaussians. By leveraging local image features, Iterative Gaussian Refiner (IGR) further refines Gaussian representations via deformable attention. Finally, novel views are rendered from the refined 3D Gaussians using rasterization-based rendering.
  • Figure 3: Illustration of the proposed CGA and IGR Blocks. (a) CGA comprises a keypoint scorer followed by a series of hypernetworks that produce context-aware thresholds to guide the splitting and pruning of Gaussians. (b) IGR further facilitates direct image-Gaussian interactions, enabling Gaussian representations to capture and extract local geometric features more effectively.
  • Figure 4: Visualization results on ACID and RealEstate10K benchmarks. Pixel-wise methods suffer from Gaussian overlap due to suboptimal Gaussian distributions, whereas PixelGaussian enables dynamic Gaussian adaption and improved local geometry refinement.
  • Figure 5: Visualization of score maps and Gaussian distributions on RealEstate10K dataset. Cascade Gaussian Adapter dynamically adjusts Gaussian distribution and quantity based on score maps. More Gaussians are allocated to detail-rich regions for more precise representations, while pruning minimizes Gaussian redundancy and overlap across views.
  • ...and 3 more figures