Table of Contents
Fetching ...

Compact Visual Data Representation for Green Multimedia -- A Human Visual System Perspective

Peilin Chen, Xiaohan Fang, Meng Wang, Shiqi Wang, Siwei Ma

TL;DR

This survey addresses the challenge of green multimedia by focusing on knowledge extraction rather than image reconstruction, drawing on Human Visual System insights to guide compact representations. It surveys three main strands—compact video compression (standards, end-to-end, perceptual, external-data), compact feature compression, and unified representations for dynamic tasks—along with their AI and networking connections. It highlights the Digital Retina framework and neural codecs as key directions for ultra-low bitrate performance with energy efficiency. The findings emphasize the potential of collaborative edge–cloud architectures and task-driven representations to enable energy-efficient analytics and sustainable multimedia deployment.

Abstract

The Human Visual System (HVS), with its intricate sophistication, is capable of achieving ultra-compact information compression for visual signals. This remarkable ability is coupled with high generalization capability and energy efficiency. By contrast, the state-of-the-art Versatile Video Coding (VVC) standard achieves a compression ratio of around 1,000 times for raw visual data. This notable disparity motivates the research community to draw inspiration to effectively handle the immense volume of visual data in a green way. Therefore, this paper provides a survey of how visual data can be efficiently represented for green multimedia, in particular when the ultimate task is knowledge extraction instead of visual signal reconstruction. We introduce recent research efforts that promote green, sustainable, and efficient multimedia in this field. Moreover, we discuss how the deep understanding of the HVS can benefit the research community, and envision the development of future green multimedia technologies.

Compact Visual Data Representation for Green Multimedia -- A Human Visual System Perspective

TL;DR

This survey addresses the challenge of green multimedia by focusing on knowledge extraction rather than image reconstruction, drawing on Human Visual System insights to guide compact representations. It surveys three main strands—compact video compression (standards, end-to-end, perceptual, external-data), compact feature compression, and unified representations for dynamic tasks—along with their AI and networking connections. It highlights the Digital Retina framework and neural codecs as key directions for ultra-low bitrate performance with energy efficiency. The findings emphasize the potential of collaborative edge–cloud architectures and task-driven representations to enable energy-efficient analytics and sustainable multimedia deployment.

Abstract

The Human Visual System (HVS), with its intricate sophistication, is capable of achieving ultra-compact information compression for visual signals. This remarkable ability is coupled with high generalization capability and energy efficiency. By contrast, the state-of-the-art Versatile Video Coding (VVC) standard achieves a compression ratio of around 1,000 times for raw visual data. This notable disparity motivates the research community to draw inspiration to effectively handle the immense volume of visual data in a green way. Therefore, this paper provides a survey of how visual data can be efficiently represented for green multimedia, in particular when the ultimate task is knowledge extraction instead of visual signal reconstruction. We introduce recent research efforts that promote green, sustainable, and efficient multimedia in this field. Moreover, we discuss how the deep understanding of the HVS can benefit the research community, and envision the development of future green multimedia technologies.

Paper Structure

This paper contains 13 sections, 2 figures.

Figures (2)

  • Figure 1: This diagram depicts how the psychophysical HVS features can be utilized for compact visual data representation.
  • Figure 2: The roadmap of compact visual data technologies regarding signal-based perception, semantic-based understanding, as well as unified perception and decision-making.