Table of Contents
Fetching ...

SGI: Structured 2D Gaussians for Efficient and Compact Large Image Representation

Zixuan Pan, Kaiyuan Tang, Jun Xia, Yifan Qin, Lin Gu, Chaoli Wang, Jianxu Chen, Yiyu Shi

TL;DR

This work proposes Structured Gaussian Image (SGI), a compact and efficient framework for representing high-resolution images that refines the seed representation in a coarse-to-fine manner, substantially accelerating convergence.

Abstract

2D Gaussian Splatting has emerged as a novel image representation technique that can support efficient rendering on low-end devices. However, scaling to high-resolution images requires optimizing and storing millions of unstructured Gaussian primitives independently, leading to slow convergence and redundant parameters. To address this, we propose Structured Gaussian Image (SGI), a compact and efficient framework for representing high-resolution images. SGI decomposes a complex image into multi-scale local spaces defined by a set of seeds. Each seed corresponds to a spatially coherent region and, together with lightweight multi-layer perceptrons (MLPs), generates structured implicit 2D neural Gaussians. This seed-based formulation imposes structural regularity on otherwise unstructured Gaussian primitives, which facilitates entropy-based compression at the seed level to reduce the total storage. However, optimizing seed parameters directly on high-resolution images is a challenging and non-trivial task. Therefore, we designed a multi-scale fitting strategy that refines the seed representation in a coarse-to-fine manner, substantially accelerating convergence. Quantitative and qualitative evaluations demonstrate that SGI achieves up to 7.5x compression over prior non-quantized 2D Gaussian methods and 1.6x over quantized ones, while also delivering 1.6x and 6.5x faster optimization, respectively, without degrading, and often improving, image fidelity. Code is available at https://github.com/zx-pan/SGI.

SGI: Structured 2D Gaussians for Efficient and Compact Large Image Representation

TL;DR

This work proposes Structured Gaussian Image (SGI), a compact and efficient framework for representing high-resolution images that refines the seed representation in a coarse-to-fine manner, substantially accelerating convergence.

Abstract

2D Gaussian Splatting has emerged as a novel image representation technique that can support efficient rendering on low-end devices. However, scaling to high-resolution images requires optimizing and storing millions of unstructured Gaussian primitives independently, leading to slow convergence and redundant parameters. To address this, we propose Structured Gaussian Image (SGI), a compact and efficient framework for representing high-resolution images. SGI decomposes a complex image into multi-scale local spaces defined by a set of seeds. Each seed corresponds to a spatially coherent region and, together with lightweight multi-layer perceptrons (MLPs), generates structured implicit 2D neural Gaussians. This seed-based formulation imposes structural regularity on otherwise unstructured Gaussian primitives, which facilitates entropy-based compression at the seed level to reduce the total storage. However, optimizing seed parameters directly on high-resolution images is a challenging and non-trivial task. Therefore, we designed a multi-scale fitting strategy that refines the seed representation in a coarse-to-fine manner, substantially accelerating convergence. Quantitative and qualitative evaluations demonstrate that SGI achieves up to 7.5x compression over prior non-quantized 2D Gaussian methods and 1.6x over quantized ones, while also delivering 1.6x and 6.5x faster optimization, respectively, without degrading, and often improving, image fidelity. Code is available at https://github.com/zx-pan/SGI.
Paper Structure (19 sections, 14 equations, 6 figures, 9 tables, 1 algorithm)

This paper contains 19 sections, 14 equations, 6 figures, 9 tables, 1 algorithm.

Figures (6)

  • Figure 1: Image representation results on the FGF2 Li-IAC15 and ICB image-dataset datasets. The x-axis (log scale) denotes optimization time in minutes, and the y-axis shows PSNR (dB). Each point represents a specific model configuration, with the area of the circle indicating its storage size. For GaussianImage and our SGI, we plot performance curves obtained by varying the number of Gaussian primitives. Our SGI consistently achieves a favorable trade-off between fidelity, compactness, and optimization time.
  • Figure 2: The overall pipeline of SGI. We first introduce (a) seed-based 2D neural Gaussians, where each seed predicts a group of 2D Gaussian primitives via two shared MLPs for decoding color and covariance. To accelerate optimization, we adopt a (b) multi-scale fitting strategy that progressively refines the representation from coarse to fine using a Gaussian pyramid. Finally, we leverage (c) neural entropy coding to further compress the explicit seed attributes for compact representation.
  • Figure 3: Visual comparisons on the FGF2 and ICB (w/ zoom-in cases and error maps). Qualitative results on representative examples from FGF2 (top) and ICB (bottom), comparing SGI with I-NGP Muller-TOG22, Scaffold-GS Lu-CVPR24, LIG Zhu-AAAI25, and GaussianImage Zhang-ECCV24. Zoom-in regions highlight perceptual differences. In the third row of each block, we visualize the per-pixel reconstruction error using heatmaps, where warmer colors (e.g., yellow) indicate higher deviation from the ground truth. PSNR (dB) and storage size (MB) for each method are shown below the visualizations.
  • Figure 4: Visual comparison with traditional image codec method JPEG on the ICB dataset at low Bpp. We report the bit per pixel (bpp) and PSNR (dB) for each method below the visualizations.
  • Figure 5: Rate-distortion curves of our approach and different compression baselines on ICB dataset in PSNR.
  • ...and 1 more figures