Table of Contents
Fetching ...

Progressive Fourier Neural Representation for Sequential Video Compilation

Haeyong Kang, Jaehong Yoon, DaHyun Kim, Sung Ju Hwang, Chang D Yoo

TL;DR

This paper tackles the challenge of generalizing neural implicit representations (NIR) across sequential videos by introducing Progressive Fourier Neural Representation (PFNR), which identifies adaptive, sparse subnetworks in Fourier space (Fourier Subnetwork Operator, or FSO) and progressively accumulates video representations without forgetting. PFNR builds on a NeRV-style backbone and leverages a Lottery Ticket-inspired mechanism to select per-video subnetworks, freezing past weights and allowing overlap for forward transfer without a replay buffer. Empirical results on UVG8/17 and DAVIS50 show PFNR achieving higher PSNR and MS-SSIM than strong continual-learning baselines, while enabling efficient compression and memory savings. The approach offers a scalable, forget-free paradigm for sequential video compression and representation with potential applications in streaming and real-time video processing.

Abstract

Neural Implicit Representation (NIR) has recently gained significant attention due to its remarkable ability to encode complex and high-dimensional data into representation space and easily reconstruct it through a trainable mapping function. However, NIR methods assume a one-to-one mapping between the target data and representation models regardless of data relevancy or similarity. This results in poor generalization over multiple complex data and limits their efficiency and scalability. Motivated by continual learning, this work investigates how to accumulate and transfer neural implicit representations for multiple complex video data over sequential encoding sessions. To overcome the limitation of NIR, we propose a novel method, Progressive Fourier Neural Representation (PFNR), that aims to find an adaptive and compact sub-module in Fourier space to encode videos in each training session. This sparsified neural encoding allows the neural network to hold free weights, enabling an improved adaptation for future videos. In addition, when learning a representation for a new video, PFNR transfers the representation of previous videos with frozen weights. This design allows the model to continuously accumulate high-quality neural representations for multiple videos while ensuring lossless decoding that perfectly preserves the learned representations for previous videos. We validate our PFNR method on the UVG8/17 and DAVIS50 video sequence benchmarks and achieve impressive performance gains over strong continual learning baselines. The PFNR code is available at https://github.com/ihaeyong/PFNR.git.

Progressive Fourier Neural Representation for Sequential Video Compilation

TL;DR

This paper tackles the challenge of generalizing neural implicit representations (NIR) across sequential videos by introducing Progressive Fourier Neural Representation (PFNR), which identifies adaptive, sparse subnetworks in Fourier space (Fourier Subnetwork Operator, or FSO) and progressively accumulates video representations without forgetting. PFNR builds on a NeRV-style backbone and leverages a Lottery Ticket-inspired mechanism to select per-video subnetworks, freezing past weights and allowing overlap for forward transfer without a replay buffer. Empirical results on UVG8/17 and DAVIS50 show PFNR achieving higher PSNR and MS-SSIM than strong continual-learning baselines, while enabling efficient compression and memory savings. The approach offers a scalable, forget-free paradigm for sequential video compression and representation with potential applications in streaming and real-time video processing.

Abstract

Neural Implicit Representation (NIR) has recently gained significant attention due to its remarkable ability to encode complex and high-dimensional data into representation space and easily reconstruct it through a trainable mapping function. However, NIR methods assume a one-to-one mapping between the target data and representation models regardless of data relevancy or similarity. This results in poor generalization over multiple complex data and limits their efficiency and scalability. Motivated by continual learning, this work investigates how to accumulate and transfer neural implicit representations for multiple complex video data over sequential encoding sessions. To overcome the limitation of NIR, we propose a novel method, Progressive Fourier Neural Representation (PFNR), that aims to find an adaptive and compact sub-module in Fourier space to encode videos in each training session. This sparsified neural encoding allows the neural network to hold free weights, enabling an improved adaptation for future videos. In addition, when learning a representation for a new video, PFNR transfers the representation of previous videos with frozen weights. This design allows the model to continuously accumulate high-quality neural representations for multiple videos while ensuring lossless decoding that perfectly preserves the learned representations for previous videos. We validate our PFNR method on the UVG8/17 and DAVIS50 video sequence benchmarks and achieve impressive performance gains over strong continual learning baselines. The PFNR code is available at https://github.com/ihaeyong/PFNR.git.
Paper Structure (25 sections, 14 equations, 10 figures, 17 tables, 1 algorithm)

This paper contains 25 sections, 14 equations, 10 figures, 17 tables, 1 algorithm.

Figures (10)

  • Figure 1: Progressive Fourier Neural Representation (PFNR): PFNR takes time and video (session) indices as input and uses a sparse Stem + NeRV Blocks with Fourier Subneural Operator (FSO) to output the whole image through multi-heads $H_N$ where $\tilde{\bm{v}}_s^t$ is a hidden representation. We denote frozen, reused, and trainable parameters in training at session 2. Note that each video representation is colored. In inference, we only need indices of session $s$ and frame $t$ and session mask (subnetwork).
  • Figure 2: PSNR v.s. Bits-per-pixel (BPP) on the UVG17 datasets
  • Figure 3: PFNR's Comparison of PSNR with others and layer-wise accumulated capacities on the UVG17 dataset. Note that, in (b), green represents the percentage of reused subnetwork's parameters of Stem, $f$-NeRV3, and NeRV5 at the current session (s) obtained at the past (s-1) video sessions
  • Figure 4: PFNR's Representations of NeRV Blocks with $c = 50.0 \%$ on the UVG17 dataset.
  • Figure 5: PFNR's Video Generation (from t=0 to t=3) with $c = 30.0 \%$ on the UVG17 dataset.
  • ...and 5 more figures