Gaussian Graph Network: Learning Efficient and Generalizable Gaussian Representations from Multi-view Images
Shengjun Zhang, Xin Fei, Fangfu Liu, Haixu Song, Yueqi Duan
TL;DR
The paper tackles the inefficiency and artifacts that arise when aggregating pixel-aligned Gaussians from multiple views for novel view synthesis. It introduces Gaussian Graph Networks (GGN) that build Gaussian Graphs to model inter-view relations and apply Gaussian-domain message passing and pooling, enabling cross-view interaction and compact representations. The approach demonstrates that GGN can achieve better rendering quality with significantly fewer Gaussians and faster rendering than state-of-the-art methods on RealEstate10K and ACID, highlighting improved efficiency and generalization. This work advances multi-view 3D Gaussian representations by providing a principled mechanism to fuse cross-view Gaussian information, with practical impact on real-time rendering and scalable scene reconstruction. Limitations include sensitivity to input resolution and the potential to extend the framework with geometry-focused or generative components in future work.
Abstract
3D Gaussian Splatting (3DGS) has demonstrated impressive novel view synthesis performance. While conventional methods require per-scene optimization, more recently several feed-forward methods have been proposed to generate pixel-aligned Gaussian representations with a learnable network, which are generalizable to different scenes. However, these methods simply combine pixel-aligned Gaussians from multiple views as scene representations, thereby leading to artifacts and extra memory cost without fully capturing the relations of Gaussians from different images. In this paper, we propose Gaussian Graph Network (GGN) to generate efficient and generalizable Gaussian representations. Specifically, we construct Gaussian Graphs to model the relations of Gaussian groups from different views. To support message passing at Gaussian level, we reformulate the basic graph operations over Gaussian representations, enabling each Gaussian to benefit from its connected Gaussian groups with Gaussian feature fusion. Furthermore, we design a Gaussian pooling layer to aggregate various Gaussian groups for efficient representations. We conduct experiments on the large-scale RealEstate10K and ACID datasets to demonstrate the efficiency and generalization of our method. Compared to the state-of-the-art methods, our model uses fewer Gaussians and achieves better image quality with higher rendering speed.
