Stereo Image Coding for Machines with Joint Visual Feature Compression
Dengchao Jin, Jianjun Lei, Bo Peng, Zhaoqing Pan, Nam Ling, Qingming Huang
TL;DR
This work introduces SICM, a framework for stereo image coding optimized for machine vision tasks, by learning to compress stereo visual features rather than raw images. The proposed MVSFC-Net combines a stereo feature extractor, a stereo multi-scale feature compression (SMFC) module, and a visual-analysis head for 3D object detection, with a rate-distortion objective that prioritizes task performance. The SMFC module jointly reduces intra-view, inter-view, and cross-scale redundancies to produce compact representations, yielding substantial BD-rate reductions (up to ~81% on AP3D and ~77% on APBEV) compared to MPEG anchors and prior SIC methods, particularly at low bitrates. Ablation confirms the importance of SMFC for performance, and the method achieves favorable encoding/decoding efficiency, indicating strong potential for practical machine-vision–focused stereo coding under bandwidth constraints.
Abstract
2D image coding for machines (ICM) has achieved great success in coding efficiency, while less effort has been devoted to stereo image fields. To promote the efficiency of stereo image compression (SIC) and intelligent analysis, the stereo image coding for machines (SICM) is formulated and explored in this paper. More specifically, a machine vision-oriented stereo feature compression network (MVSFC-Net) is proposed for SICM, where the stereo visual features are effectively extracted, compressed, and transmitted for 3D visual task. To efficiently compress stereo visual features in MVSFC-Net, a stereo multi-scale feature compression (SMFC) module is designed to gradually transform sparse stereo multi-scale features into compact joint visual representations by removing spatial, inter-view, and cross-scale redundancies simultaneously. Experimental results show that the proposed MVSFC-Net obtains superior compression efficiency as well as 3D visual task performance, when compared with the existing ICM anchors recommended by MPEG and the state-of-the-art SIC method.
