Table of Contents
Fetching ...

Multiscale Feature Importance-based Bit Allocation for End-to-End Feature Coding for Machines

Junle Liu, Yun Zhang, Zixi Guo

TL;DR

This work addresses the mismatch between human-centric compression and machine vision needs by introducing Multiscale Feature Importance-based Bit Allocation (MFIBA) for end-to-end Feature Coding for Machines (FCM). It identifies that multiscale feature importance varies with object size and image instance, and proposes the MFIP module to predict per-scale importance, coupled with a Task Loss-Rate Model to relate task losses to bitrates. The online MFIBA framework then allocates bits across feature scales via an improved rate–distortion objective and a Rate-φ mapping to end-to-end LIC codecs, achieving substantial bitrate savings across object detection, instance segmentation, and keypoint detection while preserving accuracy. Experimental results show significant improvements over anchor codecs like ELIC and LIC-TCM, with strong generalizability to different tasks and codecs, indicating practical impact for remote intelligent analytics and machine vision systems.

Abstract

Feature Coding for Machines (FCM) aims to compress intermediate features effectively for remote intelligent analytics, which is crucial for future intelligent visual applications. In this paper, we propose a Multiscale Feature Importance-based Bit Allocation (MFIBA) for end-to-end FCM. First, we find that the importance of features for machine vision tasks varies with the scales, object size, and image instances. Based on this finding, we propose a Multiscale Feature Importance Prediction (MFIP) module to predict the importance weight for each scale of features. Secondly, we propose a task loss-rate model to establish the relationship between the task accuracy losses of using compressed features and the bitrate of encoding these features. Finally, we develop a MFIBA for end-to-end FCM, which is able to assign coding bits of multiscale features more reasonably based on their importance. Experimental results demonstrate that when combined with a retained Efficient Learned Image Compression (ELIC), the proposed MFIBA achieves an average of 38.202% bitrate savings in object detection compared to the anchor ELIC. Moreover, the proposed MFIBA achieves an average of 17.212% and 36.492% feature bitrate savings for instance segmentation and keypoint detection, respectively. When the proposed MFIBA is applied to the LIC-TCM, it achieves an average of 18.103%, 19.866% and 19.597% bit rate savings on three machine vision tasks, respectively, which validates the proposed MFIBA has good generalizability and adaptability to different machine vision tasks and FCM base codecs.

Multiscale Feature Importance-based Bit Allocation for End-to-End Feature Coding for Machines

TL;DR

This work addresses the mismatch between human-centric compression and machine vision needs by introducing Multiscale Feature Importance-based Bit Allocation (MFIBA) for end-to-end Feature Coding for Machines (FCM). It identifies that multiscale feature importance varies with object size and image instance, and proposes the MFIP module to predict per-scale importance, coupled with a Task Loss-Rate Model to relate task losses to bitrates. The online MFIBA framework then allocates bits across feature scales via an improved rate–distortion objective and a Rate-φ mapping to end-to-end LIC codecs, achieving substantial bitrate savings across object detection, instance segmentation, and keypoint detection while preserving accuracy. Experimental results show significant improvements over anchor codecs like ELIC and LIC-TCM, with strong generalizability to different tasks and codecs, indicating practical impact for remote intelligent analytics and machine vision systems.

Abstract

Feature Coding for Machines (FCM) aims to compress intermediate features effectively for remote intelligent analytics, which is crucial for future intelligent visual applications. In this paper, we propose a Multiscale Feature Importance-based Bit Allocation (MFIBA) for end-to-end FCM. First, we find that the importance of features for machine vision tasks varies with the scales, object size, and image instances. Based on this finding, we propose a Multiscale Feature Importance Prediction (MFIP) module to predict the importance weight for each scale of features. Secondly, we propose a task loss-rate model to establish the relationship between the task accuracy losses of using compressed features and the bitrate of encoding these features. Finally, we develop a MFIBA for end-to-end FCM, which is able to assign coding bits of multiscale features more reasonably based on their importance. Experimental results demonstrate that when combined with a retained Efficient Learned Image Compression (ELIC), the proposed MFIBA achieves an average of 38.202% bitrate savings in object detection compared to the anchor ELIC. Moreover, the proposed MFIBA achieves an average of 17.212% and 36.492% feature bitrate savings for instance segmentation and keypoint detection, respectively. When the proposed MFIBA is applied to the LIC-TCM, it achieves an average of 18.103%, 19.866% and 19.597% bit rate savings on three machine vision tasks, respectively, which validates the proposed MFIBA has good generalizability and adaptability to different machine vision tasks and FCM base codecs.

Paper Structure

This paper contains 14 sections, 8 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 1: Object detection accuracies of using compressed features at different scales (following COCO2017 dataset definition) and bit rates. (a) Small objects (b) Medium objects (c) Large objects.
  • Figure 2: Framework of the proposed MFIBA for FCM.
  • Figure 3: Flowchart of the proposed MFIP module.
  • Figure 4: Relationship between task loss $d_\emph{i}$ and importance weight $w_\emph{i}$ across all images.
  • Figure 5: Relationship between the object target sizes and the average weight $\hat{w_\emph{i}}$.
  • ...and 6 more figures