Table of Contents
Fetching ...

New VVC profiles targeting Feature Coding for Machines

Md Eimran Hossain Eimon, Ashan Perera, Juan Merlos, Velibor Adzic, Hari Kalva

TL;DR

This work addresses the mismatch between perceptual-optimized video coding and the needs of split-inference machine vision by profiling VVC within the MPEG-AI Feature Coding for Machines framework. It identifies a compact essential tool-set and proposes three low-complexity profiles (Fast, Faster, Fastest) that dramatically reduce encoding time with manageable BD-Rate changes. The results demonstrate substantial speedups (up to ~95.6%) with minimal task accuracy loss, enabling efficient on-device feature compression for remote inference. The findings offer actionable guidance for deploying feature coders on edge devices and federated systems.

Abstract

Modern video codecs have been extensively optimized to preserve perceptual quality, leveraging models of the human visual system. However, in split inference systems-where intermediate features from neural network are transmitted instead of pixel data-these assumptions no longer apply. Intermediate features are abstract, sparse, and task-specific, making perceptual fidelity irrelevant. In this paper, we investigate the use of Versatile Video Coding (VVC) for compressing such features under the MPEG-AI Feature Coding for Machines (FCM) standard. We perform a tool-level analysis to understand the impact of individual coding components on compression efficiency and downstream vision task accuracy. Based on these insights, we propose three lightweight essential VVC profiles-Fast, Faster, and Fastest. The Fast profile provides 2.96% BD-Rate gain while reducing encoding time by 21.8%. Faster achieves a 1.85% BD-Rate gain with a 51.5% speedup. Fastest reduces encoding time by 95.6% with only a 1.71% loss in BD-Rate.

New VVC profiles targeting Feature Coding for Machines

TL;DR

This work addresses the mismatch between perceptual-optimized video coding and the needs of split-inference machine vision by profiling VVC within the MPEG-AI Feature Coding for Machines framework. It identifies a compact essential tool-set and proposes three low-complexity profiles (Fast, Faster, Fastest) that dramatically reduce encoding time with manageable BD-Rate changes. The results demonstrate substantial speedups (up to ~95.6%) with minimal task accuracy loss, enabling efficient on-device feature compression for remote inference. The findings offer actionable guidance for deploying feature coders on edge devices and federated systems.

Abstract

Modern video codecs have been extensively optimized to preserve perceptual quality, leveraging models of the human visual system. However, in split inference systems-where intermediate features from neural network are transmitted instead of pixel data-these assumptions no longer apply. Intermediate features are abstract, sparse, and task-specific, making perceptual fidelity irrelevant. In this paper, we investigate the use of Versatile Video Coding (VVC) for compressing such features under the MPEG-AI Feature Coding for Machines (FCM) standard. We perform a tool-level analysis to understand the impact of individual coding components on compression efficiency and downstream vision task accuracy. Based on these insights, we propose three lightweight essential VVC profiles-Fast, Faster, and Fastest. The Fast profile provides 2.96% BD-Rate gain while reducing encoding time by 21.8%. Faster achieves a 1.85% BD-Rate gain with a 51.5% speedup. Fastest reduces encoding time by 95.6% with only a 1.71% loss in BD-Rate.

Paper Structure

This paper contains 7 sections, 7 figures, 2 tables.

Figures (7)

  • Figure 1: An example of a split inference pipeline.
  • Figure 2: A brief overview of FCM codec pipeline.
  • Figure 3: Block partitioning visualization for a cropped region of the first frame in the Traffic sequence, encoded at QP 19 using the Low-Delay configuration.
  • Figure 4: Partition depth analysis. Depth denotes final CU depth; QT_Depth and MT_Depth correspond to quad-tree and multi-type tree split depths, respectively.
  • Figure 5: Intra prediction mode usage analysis. Arrow size and color indicate the percentage of CU using each mode (Mode 18: 3.68%, Mode 50: 2.55%)
  • ...and 2 more figures