Table of Contents
Fetching ...

Light Field Compression Based on Implicit Neural Representation

Henan Wang, Hanxin Zhu, Zhibo Chen

TL;DR

The paper tackles the challenge of compressing high-dimensional light field data by introducing an SAI-wise implicit neural representation that overfits to store the 4-D light field in the network parameters. After overfitting, the authors apply pruning, 8-bit quantization, and adaptive entropy coding to produce a compact bitstream, enabling flexible decoding of arbitrary viewpoints (ROI) and improved perceptual quality. Empirical results on EPFL light-field data show competitive rate-distortion performance and superior perceptual quality at higher bitrates, with significantly faster decoding compared to pixel-wise INRs like SIREN. This approach offers a practical, ROI-friendly compression framework that leverages continuous INR representations for 4-D data while benefiting from standard model-compression techniques.

Abstract

Light field, as a new data representation format in multimedia, has the ability to capture both intensity and direction of light rays. However, the additional angular information also brings a large volume of data. Classical coding methods are not effective to describe the relationship between different views, leading to redundancy left. To address this problem, we propose a novel light field compression scheme based on implicit neural representation to reduce redundancies between views. We store the information of a light field image implicitly in an neural network and adopt model compression methods to further compress the implicit representation. Extensive experiments have demonstrated the effectiveness of our proposed method, which achieves comparable rate-distortion performance as well as superior perceptual quality over traditional methods.

Light Field Compression Based on Implicit Neural Representation

TL;DR

The paper tackles the challenge of compressing high-dimensional light field data by introducing an SAI-wise implicit neural representation that overfits to store the 4-D light field in the network parameters. After overfitting, the authors apply pruning, 8-bit quantization, and adaptive entropy coding to produce a compact bitstream, enabling flexible decoding of arbitrary viewpoints (ROI) and improved perceptual quality. Empirical results on EPFL light-field data show competitive rate-distortion performance and superior perceptual quality at higher bitrates, with significantly faster decoding compared to pixel-wise INRs like SIREN. This approach offers a practical, ROI-friendly compression framework that leverages continuous INR representations for 4-D data while benefiting from standard model-compression techniques.

Abstract

Light field, as a new data representation format in multimedia, has the ability to capture both intensity and direction of light rays. However, the additional angular information also brings a large volume of data. Classical coding methods are not effective to describe the relationship between different views, leading to redundancy left. To address this problem, we propose a novel light field compression scheme based on implicit neural representation to reduce redundancies between views. We store the information of a light field image implicitly in an neural network and adopt model compression methods to further compress the implicit representation. Extensive experiments have demonstrated the effectiveness of our proposed method, which achieves comparable rate-distortion performance as well as superior perceptual quality over traditional methods.
Paper Structure (24 sections, 5 equations, 4 figures, 2 tables)

This paper contains 24 sections, 5 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: The encoding pipeline of our method. The INR network is well optimized to overfit the target light field image, then implicitly compressed to the bitstream.
  • Figure 2: (a) Model architecture of our proposed method. (b) NeRV block structurechen2021nerv. (c) Residual block structurehe2016deep. Our model comprises positional encoding, multi-layer perceptron (MLP) and upsampling network. MLP includes two fully connected (FC) layers. The upsampling network contains 5 consecutive NeRV blocks and residual blocks. Each NeRV block can upscale the height and width of the input feature map by a preset factor using the PixelShuffle techniqueshi2016real. Residual blocks are able to improve output image quality by the residual structure shown in (c)
  • Figure 3: Perceptual quality comparison on (a) I01 and (b) I04. We can observe that our method preserves more details and textures.
  • Figure 4: Average RD performance of the proposed method and other methods on I01, I02, I04, I09 of the EPFL LF Dataset.