Table of Contents
Fetching ...

Gaussian Graph Network: Learning Efficient and Generalizable Gaussian Representations from Multi-view Images

Shengjun Zhang, Xin Fei, Fangfu Liu, Haixu Song, Yueqi Duan

TL;DR

The paper tackles the inefficiency and artifacts that arise when aggregating pixel-aligned Gaussians from multiple views for novel view synthesis. It introduces Gaussian Graph Networks (GGN) that build Gaussian Graphs to model inter-view relations and apply Gaussian-domain message passing and pooling, enabling cross-view interaction and compact representations. The approach demonstrates that GGN can achieve better rendering quality with significantly fewer Gaussians and faster rendering than state-of-the-art methods on RealEstate10K and ACID, highlighting improved efficiency and generalization. This work advances multi-view 3D Gaussian representations by providing a principled mechanism to fuse cross-view Gaussian information, with practical impact on real-time rendering and scalable scene reconstruction. Limitations include sensitivity to input resolution and the potential to extend the framework with geometry-focused or generative components in future work.

Abstract

3D Gaussian Splatting (3DGS) has demonstrated impressive novel view synthesis performance. While conventional methods require per-scene optimization, more recently several feed-forward methods have been proposed to generate pixel-aligned Gaussian representations with a learnable network, which are generalizable to different scenes. However, these methods simply combine pixel-aligned Gaussians from multiple views as scene representations, thereby leading to artifacts and extra memory cost without fully capturing the relations of Gaussians from different images. In this paper, we propose Gaussian Graph Network (GGN) to generate efficient and generalizable Gaussian representations. Specifically, we construct Gaussian Graphs to model the relations of Gaussian groups from different views. To support message passing at Gaussian level, we reformulate the basic graph operations over Gaussian representations, enabling each Gaussian to benefit from its connected Gaussian groups with Gaussian feature fusion. Furthermore, we design a Gaussian pooling layer to aggregate various Gaussian groups for efficient representations. We conduct experiments on the large-scale RealEstate10K and ACID datasets to demonstrate the efficiency and generalization of our method. Compared to the state-of-the-art methods, our model uses fewer Gaussians and achieves better image quality with higher rendering speed.

Gaussian Graph Network: Learning Efficient and Generalizable Gaussian Representations from Multi-view Images

TL;DR

The paper tackles the inefficiency and artifacts that arise when aggregating pixel-aligned Gaussians from multiple views for novel view synthesis. It introduces Gaussian Graph Networks (GGN) that build Gaussian Graphs to model inter-view relations and apply Gaussian-domain message passing and pooling, enabling cross-view interaction and compact representations. The approach demonstrates that GGN can achieve better rendering quality with significantly fewer Gaussians and faster rendering than state-of-the-art methods on RealEstate10K and ACID, highlighting improved efficiency and generalization. This work advances multi-view 3D Gaussian representations by providing a principled mechanism to fuse cross-view Gaussian information, with practical impact on real-time rendering and scalable scene reconstruction. Limitations include sensitivity to input resolution and the potential to extend the framework with geometry-focused or generative components in future work.

Abstract

3D Gaussian Splatting (3DGS) has demonstrated impressive novel view synthesis performance. While conventional methods require per-scene optimization, more recently several feed-forward methods have been proposed to generate pixel-aligned Gaussian representations with a learnable network, which are generalizable to different scenes. However, these methods simply combine pixel-aligned Gaussians from multiple views as scene representations, thereby leading to artifacts and extra memory cost without fully capturing the relations of Gaussians from different images. In this paper, we propose Gaussian Graph Network (GGN) to generate efficient and generalizable Gaussian representations. Specifically, we construct Gaussian Graphs to model the relations of Gaussian groups from different views. To support message passing at Gaussian level, we reformulate the basic graph operations over Gaussian representations, enabling each Gaussian to benefit from its connected Gaussian groups with Gaussian feature fusion. Furthermore, we design a Gaussian pooling layer to aggregate various Gaussian groups for efficient representations. We conduct experiments on the large-scale RealEstate10K and ACID datasets to demonstrate the efficiency and generalization of our method. Compared to the state-of-the-art methods, our model uses fewer Gaussians and achieves better image quality with higher rendering speed.

Paper Structure

This paper contains 15 sections, 14 equations, 5 figures, 5 tables, 1 algorithm.

Figures (5)

  • Figure 1: Comparison of previous methods and ours. (a) We visualize the rendering results of various methods and report the number of Gaussians in parentheses. (b) Previous pixel-wise methods can be considered as a degraded case of Gaussian Graphs without edges. (c) We report PSNR as well as the number of Gaussians for pixelSplat pixelSplat2023arXiv, MVSplat MVSplat2024arXiv and GNN under different input settings.
  • Figure 2: Overview of Gaussian Graph Network. Given multiple input images, we extract image features and predict the means and features of pixel-aligned Gaussians. Then, we construct a Gaussian Graph to model the relations between different Gaussian nodes. We introduce Gaussian Graph Network to process our Gaussian Graph. The parameter predictor generates Gaussians parameters from the output Gaussian features.
  • Figure 3: Visualization results on RealEstate10K RealEstate10K2018 and ACID ACID2021ICCV benchmarks. We evaluate all models with 4, 8, 16 views as input and subsequently test on three target novel views.
  • Figure 4: Efficiency analysis. We report the number of Gaussians (M), rendering frames per second (FPS) and reconstruction PSNR of pixelSplat pixelSplat2023arXiv, MVSplat MVSplat2024arXiv and our GGN.
  • Figure 5: Visualization of model performance for cross-dataset generalization on RealEstate10K RealEstate10K2018 and ACID ACID2021ICCV benchmarks.