Table of Contents
Fetching ...

Object-Attribute-Relation Representation Based Video Semantic Communication

Qiyuan Du, Yiping Duan, Qianqian Yang, Xiaoming Tao, Mérouane Debbah

TL;DR

The paper tackles the challenge of efficient video transmission under bandwidth and noise constraints by introducing a structured semantic representation, Object-Attribute-Relation (OAR), for videos. OAR encodes scenes as graphs of objects, their attributes, and inter-object relations, enabling low-bit-rate transmission and generative reconstruction conditioned on a reference frame. The authors integrate OAR into both video coding and joint source-channel coding (JSCC) pipelines, including an OAR-modulated image JSCC backbone and an OAR-based video transmission pipeline that transmits OAR sequences alongside reference frames. Experimental results on traffic-surveillance datasets show that OAR-based coding improves perceptual quality and downstream task performance at low bit-rates and robustly supports transmission over noisy channels, with notable rate savings and mAP gains compared to conventional codecs and prior semantic approaches. The work also provides extensive ablation studies and discusses inference-time considerations and potential applications such as privacy-aware or scene-specific semantic control.

Abstract

With the rapid growth of multimedia data volume, there is an increasing need for efficient video transmission in applications such as virtual reality and future video streaming services. Semantic communication is emerging as a vital technique for ensuring efficient and reliable transmission in low-bandwidth, high-noise settings. However, most current approaches focus on joint source-channel coding (JSCC) that depends on end-to-end training. These methods often lack an interpretable semantic representation and struggle with adaptability to various downstream tasks. In this paper, we introduce the use of object-attribute-relation (OAR) as a semantic framework for videos to facilitate low bit-rate coding and enhance the JSCC process for more effective video transmission. We utilize OAR sequences for both low bit-rate representation and generative video reconstruction. Additionally, we incorporate OAR into the image JSCC model to prioritize communication resources for areas more critical to downstream tasks. Our experiments on traffic surveillance video datasets assess the effectiveness of our approach in terms of video transmission performance. The empirical findings demonstrate that our OAR-based video coding method not only outperforms H.265 coding at lower bit-rates but also synergizes with JSCC to deliver robust and efficient video transmission.

Object-Attribute-Relation Representation Based Video Semantic Communication

TL;DR

The paper tackles the challenge of efficient video transmission under bandwidth and noise constraints by introducing a structured semantic representation, Object-Attribute-Relation (OAR), for videos. OAR encodes scenes as graphs of objects, their attributes, and inter-object relations, enabling low-bit-rate transmission and generative reconstruction conditioned on a reference frame. The authors integrate OAR into both video coding and joint source-channel coding (JSCC) pipelines, including an OAR-modulated image JSCC backbone and an OAR-based video transmission pipeline that transmits OAR sequences alongside reference frames. Experimental results on traffic-surveillance datasets show that OAR-based coding improves perceptual quality and downstream task performance at low bit-rates and robustly supports transmission over noisy channels, with notable rate savings and mAP gains compared to conventional codecs and prior semantic approaches. The work also provides extensive ablation studies and discusses inference-time considerations and potential applications such as privacy-aware or scene-specific semantic control.

Abstract

With the rapid growth of multimedia data volume, there is an increasing need for efficient video transmission in applications such as virtual reality and future video streaming services. Semantic communication is emerging as a vital technique for ensuring efficient and reliable transmission in low-bandwidth, high-noise settings. However, most current approaches focus on joint source-channel coding (JSCC) that depends on end-to-end training. These methods often lack an interpretable semantic representation and struggle with adaptability to various downstream tasks. In this paper, we introduce the use of object-attribute-relation (OAR) as a semantic framework for videos to facilitate low bit-rate coding and enhance the JSCC process for more effective video transmission. We utilize OAR sequences for both low bit-rate representation and generative video reconstruction. Additionally, we incorporate OAR into the image JSCC model to prioritize communication resources for areas more critical to downstream tasks. Our experiments on traffic surveillance video datasets assess the effectiveness of our approach in terms of video transmission performance. The empirical findings demonstrate that our OAR-based video coding method not only outperforms H.265 coding at lower bit-rates but also synergizes with JSCC to deliver robust and efficient video transmission.
Paper Structure (29 sections, 11 equations, 15 figures, 4 tables)

This paper contains 29 sections, 11 equations, 15 figures, 4 tables.

Figures (15)

  • Figure 1: Comparison between traditional communication, classical semantic communication (without the OAR representation module) and the proposed OAR-based video transmission frameworks.
  • Figure 2: Overall framework of OAR-based video compressive coding and transmission. Frames are represented by OAR and transmitted via LDPC channel coding and QAM modulation. Additionally, the reference frame is coded and transmitted via OAR-assisted JSCC.
  • Figure 3: Framework of OAR-based video generation and visualizations for intermediate results.
  • Figure 4: The framework of OAR-assisted image JSCC. The SNR is assumed to be available at both the transmitter and the receiver. OAR undergoes lossless transmission, ensuring that both the transmitter and receiver obtain identical OAR features ${\bf F}_\text{l}$ through the utilization of OAR feature extraction networks with identical parameters.
  • Figure 5: Performance comparison of the proposed OAR based method with H.264, H.265 and DVC at different bit-rates on the UA-DETRAC dataset.
  • ...and 10 more figures