Object-Attribute-Relation Representation Based Video Semantic Communication
Qiyuan Du, Yiping Duan, Qianqian Yang, Xiaoming Tao, Mérouane Debbah
TL;DR
The paper tackles the challenge of efficient video transmission under bandwidth and noise constraints by introducing a structured semantic representation, Object-Attribute-Relation (OAR), for videos. OAR encodes scenes as graphs of objects, their attributes, and inter-object relations, enabling low-bit-rate transmission and generative reconstruction conditioned on a reference frame. The authors integrate OAR into both video coding and joint source-channel coding (JSCC) pipelines, including an OAR-modulated image JSCC backbone and an OAR-based video transmission pipeline that transmits OAR sequences alongside reference frames. Experimental results on traffic-surveillance datasets show that OAR-based coding improves perceptual quality and downstream task performance at low bit-rates and robustly supports transmission over noisy channels, with notable rate savings and mAP gains compared to conventional codecs and prior semantic approaches. The work also provides extensive ablation studies and discusses inference-time considerations and potential applications such as privacy-aware or scene-specific semantic control.
Abstract
With the rapid growth of multimedia data volume, there is an increasing need for efficient video transmission in applications such as virtual reality and future video streaming services. Semantic communication is emerging as a vital technique for ensuring efficient and reliable transmission in low-bandwidth, high-noise settings. However, most current approaches focus on joint source-channel coding (JSCC) that depends on end-to-end training. These methods often lack an interpretable semantic representation and struggle with adaptability to various downstream tasks. In this paper, we introduce the use of object-attribute-relation (OAR) as a semantic framework for videos to facilitate low bit-rate coding and enhance the JSCC process for more effective video transmission. We utilize OAR sequences for both low bit-rate representation and generative video reconstruction. Additionally, we incorporate OAR into the image JSCC model to prioritize communication resources for areas more critical to downstream tasks. Our experiments on traffic surveillance video datasets assess the effectiveness of our approach in terms of video transmission performance. The empirical findings demonstrate that our OAR-based video coding method not only outperforms H.265 coding at lower bit-rates but also synergizes with JSCC to deliver robust and efficient video transmission.
