Survey on Visual Signal Coding and Processing with Generative Models: Technologies, Standards and Optimization

Zhibo Chen; Heming Sun; Li Zhang; Fan Zhang

Survey on Visual Signal Coding and Processing with Generative Models: Technologies, Standards and Optimization

Zhibo Chen, Heming Sun, Li Zhang, Fan Zhang

TL;DR

This survey analyzes the rapid growth of visual signal coding and processing driven by generative models, covering GANs, VAEs, autoregressive models, normalizing flows, and diffusion models. It surveys image and video coding techniques, including end-to-end learned frameworks, perceptual-quality optimization, and rate-distortion-perception trade-offs, as well as standardization efforts such as JPEG AI and neural video coding initiatives. The paper also discusses visual signal restoration, synthesis, editing, and interpolation under generative modeling, along with quality assessment strategies for both generative content and models themselves. Finally, it surveys practical optimization approaches—algorithmic, architectural, and system-level—aimed at achieving real-time performance on diverse hardware, highlighting the field’s potential and remaining challenges for deployment. Collectively, the work provides a comprehensive reference to researchers and practitioners for advancing AI-based visual signal coding and processing.

Abstract

This paper provides a survey of the latest developments in visual signal coding and processing with generative models. Specifically, our focus is on presenting the advancement of generative models and their influence on research in the domain of visual signal coding and processing. This survey study begins with a brief introduction of well-established generative models, including the Variational Autoencoder (VAE) models, Generative Adversarial Network (GAN) models, Autoregressive (AR) models, Normalizing Flows and Diffusion models. The subsequent section of the paper explores the advancements in visual signal coding based on generative models, as well as the ongoing international standardization activities. In the realm of visual signal processing, our focus lies on the application and development of various generative models in the research of visual signal restoration. We also present the latest developments in generative visual signal synthesis and editing, along with visual signal quality assessment using generative models and quality assessment for generative models. The practical implementation of these studies is closely linked to the investigation of fast optimization. This paper additionally presents the latest advancements in fast optimization on visual signal coding and processing with generative models. We hope to advance this field by providing researchers and practitioners a comprehensive literature review on the topic of visual signal coding and processing with generative models.

Survey on Visual Signal Coding and Processing with Generative Models: Technologies, Standards and Optimization

TL;DR

Abstract

Paper Structure (43 sections, 3 equations, 7 figures, 1 table)

This paper contains 43 sections, 3 equations, 7 figures, 1 table.

Introduction
Generative Models
Generative Adversarial Networks (GAN)
Variational Autoencoders (VAE)
Autoregressive Models
Normalizing Flows
Diffusion Models
Visual Signal Coding with Generative Models
Image Coding with Generative Models
Probabilistic Generative Models for Image Coding
Generative Image Coding for Perceptual Quality
The Rate-Distortion-Perception Trade-off
Video Coding with Generative Models
Autoencoder-based Coding Models for Video Compression
Hybrid Coding Models for Video Compression
...and 28 more sections

Figures (7)

Figure 1: Neural Video Compression models.
Figure 2: VAE-based JPEG-AI framework. $x$ and $\hat{x}$ denote the original input image and the reconstructed images, respectively. The red modules are standardized in JPEG-AI. The blue-green modules are the encoder side operations overview_slide.
Figure 3: Illustration of the sequential architecture and the decoupled architecture; left: sequential architecture; right: decoupled architecture decoupledCFP_bytedance.
Figure 4: Illustration of the serial processing based on the raster-scan order and wavefront processing decoupledCFP_bytedance.
Figure 5: Framework of Generative Face Video Coding.
...and 2 more figures

Survey on Visual Signal Coding and Processing with Generative Models: Technologies, Standards and Optimization

TL;DR

Abstract

Survey on Visual Signal Coding and Processing with Generative Models: Technologies, Standards and Optimization

Authors

TL;DR

Abstract

Table of Contents

Figures (7)