Table of Contents
Fetching ...

GaussianSR: High Fidelity 2D Gaussian Splatting for Arbitrary-Scale Image Super-Resolution

Jintong Hu, Bin Xia, Bin Chen, Wenming Yang, Lei Zhang

TL;DR

This work targets the fidelity limitations of implicit neural representations for arbitrary-scale image super-resolution by introducing GaussianSR, which models each pixel as a continuous Gaussian field and learns a per-pixel Gaussian kernel via a classifier. The approach leverages 2D Gaussian Splatting, LR feature initialization, and a dual-stream upsampling architecture to enable end-to-end training with fewer parameters while capturing long-range dependencies. Empirical results across DIV2K, General100, BSD100, Urban100, and Manga109 show competitive or superior performance, especially on noninteger scales and high-resolution content, with ablations supporting the effectiveness of the Gaussian bank size and channel decoupling. Overall, GaussianSR establishes a new paradigm for ASSR by providing a continuous, interpretable, and efficient representation that flexibly adapts to input characteristics through learned Gaussian kernels.

Abstract

Implicit neural representations (INRs) have significantly advanced the field of arbitrary-scale super-resolution (ASSR) of images. Most existing INR-based ASSR networks first extract features from the given low-resolution image using an encoder, and then render the super-resolved result via a multi-layer perceptron decoder. Although these approaches have shown promising results, their performance is constrained by the limited representation ability of discrete latent codes in the encoded features. In this paper, we propose a novel ASSR method named GaussianSR that overcomes this limitation through 2D Gaussian Splatting (2DGS). Unlike traditional methods that treat pixels as discrete points, GaussianSR represents each pixel as a continuous Gaussian field. The encoded features are simultaneously refined and upsampled by rendering the mutually stacked Gaussian fields. As a result, long-range dependencies are established to enhance representation ability. In addition, a classifier is developed to dynamically assign Gaussian kernels to all pixels to further improve flexibility. All components of GaussianSR (i.e., encoder, classifier, Gaussian kernels, and decoder) are jointly learned end-to-end. Experiments demonstrate that GaussianSR achieves superior ASSR performance with fewer parameters than existing methods while enjoying interpretable and content-aware feature aggregations.

GaussianSR: High Fidelity 2D Gaussian Splatting for Arbitrary-Scale Image Super-Resolution

TL;DR

This work targets the fidelity limitations of implicit neural representations for arbitrary-scale image super-resolution by introducing GaussianSR, which models each pixel as a continuous Gaussian field and learns a per-pixel Gaussian kernel via a classifier. The approach leverages 2D Gaussian Splatting, LR feature initialization, and a dual-stream upsampling architecture to enable end-to-end training with fewer parameters while capturing long-range dependencies. Empirical results across DIV2K, General100, BSD100, Urban100, and Manga109 show competitive or superior performance, especially on noninteger scales and high-resolution content, with ablations supporting the effectiveness of the Gaussian bank size and channel decoupling. Overall, GaussianSR establishes a new paradigm for ASSR by providing a continuous, interpretable, and efficient representation that flexibly adapts to input characteristics through learned Gaussian kernels.

Abstract

Implicit neural representations (INRs) have significantly advanced the field of arbitrary-scale super-resolution (ASSR) of images. Most existing INR-based ASSR networks first extract features from the given low-resolution image using an encoder, and then render the super-resolved result via a multi-layer perceptron decoder. Although these approaches have shown promising results, their performance is constrained by the limited representation ability of discrete latent codes in the encoded features. In this paper, we propose a novel ASSR method named GaussianSR that overcomes this limitation through 2D Gaussian Splatting (2DGS). Unlike traditional methods that treat pixels as discrete points, GaussianSR represents each pixel as a continuous Gaussian field. The encoded features are simultaneously refined and upsampled by rendering the mutually stacked Gaussian fields. As a result, long-range dependencies are established to enhance representation ability. In addition, a classifier is developed to dynamically assign Gaussian kernels to all pixels to further improve flexibility. All components of GaussianSR (i.e., encoder, classifier, Gaussian kernels, and decoder) are jointly learned end-to-end. Experiments demonstrate that GaussianSR achieves superior ASSR performance with fewer parameters than existing methods while enjoying interpretable and content-aware feature aggregations.
Paper Structure (20 sections, 5 equations, 12 figures, 6 tables, 1 algorithm)

This paper contains 20 sections, 5 equations, 12 figures, 6 tables, 1 algorithm.

Figures (12)

  • Figure 1: Comparison of Feature Storage between INR-based ASSR and our GaussianSR. INR methods treat pixels as discrete points. Instead, our GaussianSR method models each pixel as a continuous Gaussian field. By representing pixels as continuous fields instead of discrete points, GaussianSR can explicitly represent the field values at any position (e.g $x_{q}$). GaussianSR achieves arbitrary-scale upsampling in a more elegant and natural way.
  • Figure 2: The main pipeline of GaussianSR. GaussianSR begins with an encoder that extracts feature representations from the input image, followed by Selective Gaussian Splatting which assigns a learnable Gaussian kernel to each pixel, converting dicrete feature points into Gaussian fields. Features at any arbitrary query point $x_{q}$ in the plane are computed using the overlapping Gaussian functions that modulate their influence based on the spatial location. Finally, these continuous-domain features are rendered into a high-resolution space and refined through the decoder to reconstruct the desired RGB output at the specified query coordinates.
  • Figure 3: Training and Inference Process of GaussianSR. GaussianSR employs the Selective Gaussian Splatting (SGS) module, which adaptively assigns Gaussian kernels to pixels based on their distinctive features. During training, SGS leverages the Gumbel Softmax to generate soft labels, enabling gradient backpropagation and parameter optimization for both logits and the Gaussian bank (which stores standard deviations and opacities). In the inference phase, SGS switches to hard labels, selecting the most likely Gaussian kernel for each pixel based on the optimized parameters.
  • Figure 4: The architecture of Dual-Stream Feature Decoupling. The encoded features are decoupled along the channel dimension into two tensors, with one tensor undergoing feature unfolding, Gaussian splatting, and feature folding to preserve representational details, while the other tensor is bicubically upsampled for efficiency. The upsampled outputs from these two parallel streams are then fused back to the original channel.
  • Figure 5: Qualitative comparison for ×4 SR to other ASSR methods on Urban100 urban100 dataset. EDSR-baseline EDSR is used as an encoder for all methods. For all the shown examples, our method significantly outperforms other methods, particularly in the image rich in repeated textures and structures.
  • ...and 7 more figures