Table of Contents
Fetching ...

ContextGS: Compact 3D Gaussian Splatting with Anchor Level Context Model

Yufei Wang, Zhihao Li, Lanqing Guo, Wenhan Yang, Alex C. Kot, Bihan Wen

TL;DR

This work pioneers the context model in the anchor level for 3DGS representation, yielding an impressive size reduction of over 100 times compared to vanilla 3DGS and 15 times compared to the most recent state-of-the-art work Scaffold-GS, while achieving comparable or even higher rendering quality.

Abstract

Recently, 3D Gaussian Splatting (3DGS) has become a promising framework for novel view synthesis, offering fast rendering speeds and high fidelity. However, the large number of Gaussians and their associated attributes require effective compression techniques. Existing methods primarily compress neural Gaussians individually and independently, i.e., coding all the neural Gaussians at the same time, with little design for their interactions and spatial dependence. Inspired by the effectiveness of the context model in image compression, we propose the first autoregressive model at the anchor level for 3DGS compression in this work. We divide anchors into different levels and the anchors that are not coded yet can be predicted based on the already coded ones in all the coarser levels, leading to more accurate modeling and higher coding efficiency. To further improve the efficiency of entropy coding, e.g., to code the coarsest level with no already coded anchors, we propose to introduce a low-dimensional quantized feature as the hyperprior for each anchor, which can be effectively compressed. Our work pioneers the context model in the anchor level for 3DGS representation, yielding an impressive size reduction of over 100 times compared to vanilla 3DGS and 15 times compared to the most recent state-of-the-art work Scaffold-GS, while achieving comparable or even higher rendering quality.

ContextGS: Compact 3D Gaussian Splatting with Anchor Level Context Model

TL;DR

This work pioneers the context model in the anchor level for 3DGS representation, yielding an impressive size reduction of over 100 times compared to vanilla 3DGS and 15 times compared to the most recent state-of-the-art work Scaffold-GS, while achieving comparable or even higher rendering quality.

Abstract

Recently, 3D Gaussian Splatting (3DGS) has become a promising framework for novel view synthesis, offering fast rendering speeds and high fidelity. However, the large number of Gaussians and their associated attributes require effective compression techniques. Existing methods primarily compress neural Gaussians individually and independently, i.e., coding all the neural Gaussians at the same time, with little design for their interactions and spatial dependence. Inspired by the effectiveness of the context model in image compression, we propose the first autoregressive model at the anchor level for 3DGS compression in this work. We divide anchors into different levels and the anchors that are not coded yet can be predicted based on the already coded ones in all the coarser levels, leading to more accurate modeling and higher coding efficiency. To further improve the efficiency of entropy coding, e.g., to code the coarsest level with no already coded anchors, we propose to introduce a low-dimensional quantized feature as the hyperprior for each anchor, which can be effectively compressed. Our work pioneers the context model in the anchor level for 3DGS representation, yielding an impressive size reduction of over 100 times compared to vanilla 3DGS and 15 times compared to the most recent state-of-the-art work Scaffold-GS, while achieving comparable or even higher rendering quality.
Paper Structure (18 sections, 11 equations, 8 figures, 9 tables)

This paper contains 18 sections, 11 equations, 8 figures, 9 tables.

Figures (8)

  • Figure 1: An illustration of the necessity of using autoregressive model in the anchor level. While Scaffold-GS scaffold greatly reduces the spatial redundancy among adjacent 3D neural Gaussians by grouping them and introducing a new data structure anchor to capture their common features, spatial redundancy still exists among anchors. Our method, ContextGS, first proposes to reduce the spatial redundancy among anchors using an autoregressive model. We divide anchors into levels as shown in Fig. (b) and the anchors from coarser levels are used to predict anchors in finer levels, i.e., $\bullet$ predicts $\bullet$ then $\bullet$$\bullet$ together predict $\bullet$. Fig. (c) verifies the spatial redundancy by calculating the cosine similarity between anchors in level $0$ and their context anchors in levels $1$ and $2$. Fig. (d) displays the bit savings using the proposed anchor-level context model evaluated on our entropy coding based strong baseline built on Scaffold-GS scaffold. Compared with Scaffold-GS, we achieve better rendering qualities, faster rendering speed, and great size reduction of up to $15$ times averaged over all datasets we used.
  • Figure 2: (a): An illustration of the data structure we used following Scaffold-GS scaffold, where anchor points are used to extract common features of their associated neural Gaussians. (b): The proposed multi-level division of anchor points. The decoded anchors from higher (coarser) levels are directly forwarded to the lower (finer) level to avoid duplicate storage. Besides, taking decompression as an example, the already decoded anchors are used to predict anchors that are not decompressed yet, which greatly reduces the spatial redundancy among adjacent anchors. (Best zoom in for details.)
  • Figure 3: The overall framework of the proposed method includes three levels, i.e., $K=3$, to encode the anchors. The decoded anchors from a coarser level $i+1$ are used to encode the anchors in level $i$. Besides, hyperprior features are used to predict the properties of anchors at all levels. For training, after finishing the coding of all levels, the anchor features after adaptive quantization are used to predict properties of neural Gaussians. The rendering loss is calculated and optimized together with the entropy coding loss $\mathcal{L}_{entropy}$. For testing, after we decode anchor features from the bit stream, the rendering is exactly the same with Scaffold-GS scaffold without introducing overhead.
  • Figure 3: The ablation study of our method w/ and w/o reusing anchors from coarser levels, i.e., anchor forward technique, measured on BungeeNerf BungeeNeRF dataset.
  • Figure 4: The Rate-Distortion (RD) curves for quantitative comparison between our method with most recent SOTA competitors. It is worth noting that the x-axis is in $\log$ scale for better visualization.
  • ...and 3 more figures