Table of Contents
Fetching ...

ROI-Packing: Efficient Region-Based Compression for Machine Vision

Md Eimran Hossain Eimon, Alena Krause, Ashan Perera, Juan Merlos, Hari Kalva, Velibor Adzic, Borko Furht

TL;DR

<3-5 sentence high-level summary> ROI-Packing tackles machine-vision compression by focusing on regions of interest, discarding nonessential data, and packing ROI content into a compact frame for VVC encoding. The method combines a region detector, convex-hull region merging, down-scaling, and bin-packing with metadata to enable accurate reassembly at the decoder without retraining end-task models. Evaluations across RGB and infrared datasets and two tasks show substantial bitrate reductions (up to 44.1%) and improved task accuracy at the same bitrate (up to 8.88%), outperforming MPEG's remote inference anchor. The approach offers practical, model-agnostic benefits for edge and remote inference scenarios and suggests extending ROI-Packing to video via intra-frame processing.

Abstract

This paper introduces ROI-Packing, an efficient image compression method tailored specifically for machine vision. By prioritizing regions of interest (ROI) critical to end-task accuracy and packing them efficiently while discarding less relevant data, ROI-Packing achieves significant compression efficiency without requiring retraining or fine-tuning of end-task models. Comprehensive evaluations across five datasets and two popular tasks-object detection and instance segmentation-demonstrate up to a 44.10% reduction in bitrate without compromising end-task accuracy, along with an 8.88 % improvement in accuracy at the same bitrate compared to the state-of-the-art Versatile Video Coding (VVC) codec standardized by the Moving Picture Experts Group (MPEG).

ROI-Packing: Efficient Region-Based Compression for Machine Vision

TL;DR

<3-5 sentence high-level summary> ROI-Packing tackles machine-vision compression by focusing on regions of interest, discarding nonessential data, and packing ROI content into a compact frame for VVC encoding. The method combines a region detector, convex-hull region merging, down-scaling, and bin-packing with metadata to enable accurate reassembly at the decoder without retraining end-task models. Evaluations across RGB and infrared datasets and two tasks show substantial bitrate reductions (up to 44.1%) and improved task accuracy at the same bitrate (up to 8.88%), outperforming MPEG's remote inference anchor. The approach offers practical, model-agnostic benefits for edge and remote inference scenarios and suggests extending ROI-Packing to video via intra-frame processing.

Abstract

This paper introduces ROI-Packing, an efficient image compression method tailored specifically for machine vision. By prioritizing regions of interest (ROI) critical to end-task accuracy and packing them efficiently while discarding less relevant data, ROI-Packing achieves significant compression efficiency without requiring retraining or fine-tuning of end-task models. Comprehensive evaluations across five datasets and two popular tasks-object detection and instance segmentation-demonstrate up to a 44.10% reduction in bitrate without compromising end-task accuracy, along with an 8.88 % improvement in accuracy at the same bitrate compared to the state-of-the-art Versatile Video Coding (VVC) codec standardized by the Moving Picture Experts Group (MPEG).

Paper Structure

This paper contains 14 sections, 5 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: (a) Edge Inference (b) Remote Inference
  • Figure 2: Overview of the Proposed Method
  • Figure 3: Top-down Region Extractor
  • Figure 4: (a) Original Image (Resolution: $1024\times730$) (b) Packed Image (Resolution: $352\times330$))
  • Figure 5: Rate-Accuracy Plots for Object Detection & Instance Segmentation