Table of Contents
Fetching ...

Predicting the Encoding Error of SIRENs

Jeremy Vonderfecht, Feng Liu

TL;DR

A method which predicts the encoding error that a popular INR network (SIREN) will reach, given its network hyperparameters and the signal to encode, and allows users to anticipate the encoding error that a SIREN network will reach in milliseconds instead of minutes or longer.

Abstract

Implicit Neural Representations (INRs), which encode signals such as images, videos, and 3D shapes in the weights of neural networks, are becoming increasingly popular. Among their many applications is signal compression, for which there is great interest in achieving the highest possible fidelity to the original signal subject to constraints such as neural network size, training (encoding) and inference (decoding) time. But training INRs can be a computationally expensive process, making it challenging to determine the best possible tradeoff under such constraints. Towards this goal, we present a method which predicts the encoding error that a popular INR network (SIREN) will reach, given its network hyperparameters and the signal to encode. This method is trained on a unique dataset of 300,000 SIRENs, trained across a variety of images and hyperparameters. (Dataset available here: https://huggingface.co/datasets/predict-SIREN-PSNR/COIN-collection.) Our predictive method demonstrates the feasibility of this regression problem, and allows users to anticipate the encoding error that a SIREN network will reach in milliseconds instead of minutes or longer. We also provide insights into the behavior of SIREN networks, such as why narrow SIRENs can have very high random variation in encoding error, and how the performance of SIRENs relates to JPEG compression.

Predicting the Encoding Error of SIRENs

TL;DR

A method which predicts the encoding error that a popular INR network (SIREN) will reach, given its network hyperparameters and the signal to encode, and allows users to anticipate the encoding error that a SIREN network will reach in milliseconds instead of minutes or longer.

Abstract

Implicit Neural Representations (INRs), which encode signals such as images, videos, and 3D shapes in the weights of neural networks, are becoming increasingly popular. Among their many applications is signal compression, for which there is great interest in achieving the highest possible fidelity to the original signal subject to constraints such as neural network size, training (encoding) and inference (decoding) time. But training INRs can be a computationally expensive process, making it challenging to determine the best possible tradeoff under such constraints. Towards this goal, we present a method which predicts the encoding error that a popular INR network (SIREN) will reach, given its network hyperparameters and the signal to encode. This method is trained on a unique dataset of 300,000 SIRENs, trained across a variety of images and hyperparameters. (Dataset available here: https://huggingface.co/datasets/predict-SIREN-PSNR/COIN-collection.) Our predictive method demonstrates the feasibility of this regression problem, and allows users to anticipate the encoding error that a SIREN network will reach in milliseconds instead of minutes or longer. We also provide insights into the behavior of SIREN networks, such as why narrow SIRENs can have very high random variation in encoding error, and how the performance of SIRENs relates to JPEG compression.

Paper Structure

This paper contains 36 sections, 8 equations, 15 figures, 3 tables.

Figures (15)

  • Figure 1: Overview of our many-architecture encoding error predictor. First, we train SIRENs on many images with a random sampling of hyperparameters. Then, our many-architecture predictor takes the same inputs as the SIREN training script, and predicts the SIRENs' PSNR. The training image is passed through a convolutional-network-based feature extractor (CNN), while the SIREN hyperparameters are fed through a positional encoding (PE). Then both are concatenated and passed through a fully-connected network (MLP) which predicts what PSNR the training script will reach.
  • Figure 2: Scaling curves for SIRENs trained on the first six images in the Kodak dataset. Each line represents the scaling curve for a different image.
  • Figure 3: SIREN vs. JPEG representations. In (a) and (b), each data point represents an image, and the PSNR to which it can be compressed by JPEG and COIN respectively. In (c) we show which JPEG compression ratio reaches the same PSNR as COIN for a given image size and COIN compression ratio.
  • Figure 4: A typical example of a SIREN learning curve (thick blue line) vs. NTK-based extrapolations of that learning curve.
  • Figure 5: Actual vs. predicted PSNR for GP regression models trained on several different input features. From left to right- "full image features": 512-dimensional features extracted from each image using the CNN-component of our PSNR prediction network, concatenated with the SIREN hyperparameters, then "MLP acts": three different 256-dimensional features from the activations after each layer of the 3-layer MLP regression head, then "JPEG proxy": JPEG2000 proxy PSNRs derived by compressing the image to encode using compression ratios of 1:7, 1:25, and 1:100, concatenated with the SIREN hyperparameters.
  • ...and 10 more figures