Table of Contents
Fetching ...

Survey on Visual Signal Coding and Processing with Generative Models: Technologies, Standards and Optimization

Zhibo Chen, Heming Sun, Li Zhang, Fan Zhang

TL;DR

This survey analyzes the rapid growth of visual signal coding and processing driven by generative models, covering GANs, VAEs, autoregressive models, normalizing flows, and diffusion models. It surveys image and video coding techniques, including end-to-end learned frameworks, perceptual-quality optimization, and rate-distortion-perception trade-offs, as well as standardization efforts such as JPEG AI and neural video coding initiatives. The paper also discusses visual signal restoration, synthesis, editing, and interpolation under generative modeling, along with quality assessment strategies for both generative content and models themselves. Finally, it surveys practical optimization approaches—algorithmic, architectural, and system-level—aimed at achieving real-time performance on diverse hardware, highlighting the field’s potential and remaining challenges for deployment. Collectively, the work provides a comprehensive reference to researchers and practitioners for advancing AI-based visual signal coding and processing.

Abstract

This paper provides a survey of the latest developments in visual signal coding and processing with generative models. Specifically, our focus is on presenting the advancement of generative models and their influence on research in the domain of visual signal coding and processing. This survey study begins with a brief introduction of well-established generative models, including the Variational Autoencoder (VAE) models, Generative Adversarial Network (GAN) models, Autoregressive (AR) models, Normalizing Flows and Diffusion models. The subsequent section of the paper explores the advancements in visual signal coding based on generative models, as well as the ongoing international standardization activities. In the realm of visual signal processing, our focus lies on the application and development of various generative models in the research of visual signal restoration. We also present the latest developments in generative visual signal synthesis and editing, along with visual signal quality assessment using generative models and quality assessment for generative models. The practical implementation of these studies is closely linked to the investigation of fast optimization. This paper additionally presents the latest advancements in fast optimization on visual signal coding and processing with generative models. We hope to advance this field by providing researchers and practitioners a comprehensive literature review on the topic of visual signal coding and processing with generative models.

Survey on Visual Signal Coding and Processing with Generative Models: Technologies, Standards and Optimization

TL;DR

This survey analyzes the rapid growth of visual signal coding and processing driven by generative models, covering GANs, VAEs, autoregressive models, normalizing flows, and diffusion models. It surveys image and video coding techniques, including end-to-end learned frameworks, perceptual-quality optimization, and rate-distortion-perception trade-offs, as well as standardization efforts such as JPEG AI and neural video coding initiatives. The paper also discusses visual signal restoration, synthesis, editing, and interpolation under generative modeling, along with quality assessment strategies for both generative content and models themselves. Finally, it surveys practical optimization approaches—algorithmic, architectural, and system-level—aimed at achieving real-time performance on diverse hardware, highlighting the field’s potential and remaining challenges for deployment. Collectively, the work provides a comprehensive reference to researchers and practitioners for advancing AI-based visual signal coding and processing.

Abstract

This paper provides a survey of the latest developments in visual signal coding and processing with generative models. Specifically, our focus is on presenting the advancement of generative models and their influence on research in the domain of visual signal coding and processing. This survey study begins with a brief introduction of well-established generative models, including the Variational Autoencoder (VAE) models, Generative Adversarial Network (GAN) models, Autoregressive (AR) models, Normalizing Flows and Diffusion models. The subsequent section of the paper explores the advancements in visual signal coding based on generative models, as well as the ongoing international standardization activities. In the realm of visual signal processing, our focus lies on the application and development of various generative models in the research of visual signal restoration. We also present the latest developments in generative visual signal synthesis and editing, along with visual signal quality assessment using generative models and quality assessment for generative models. The practical implementation of these studies is closely linked to the investigation of fast optimization. This paper additionally presents the latest advancements in fast optimization on visual signal coding and processing with generative models. We hope to advance this field by providing researchers and practitioners a comprehensive literature review on the topic of visual signal coding and processing with generative models.
Paper Structure (43 sections, 3 equations, 7 figures, 1 table)

This paper contains 43 sections, 3 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Neural Video Compression models.
  • Figure 2: VAE-based JPEG-AI framework. $x$ and $\hat{x}$ denote the original input image and the reconstructed images, respectively. The red modules are standardized in JPEG-AI. The blue-green modules are the encoder side operations overview_slide.
  • Figure 3: Illustration of the sequential architecture and the decoupled architecture; left: sequential architecture; right: decoupled architecture decoupledCFP_bytedance.
  • Figure 4: Illustration of the serial processing based on the raster-scan order and wavefront processing decoupledCFP_bytedance.
  • Figure 5: Framework of Generative Face Video Coding.
  • ...and 2 more figures