ROI-Packing: Efficient Region-Based Compression for Machine Vision

Md Eimran Hossain Eimon; Alena Krause; Ashan Perera; Juan Merlos; Hari Kalva; Velibor Adzic; Borko Furht

ROI-Packing: Efficient Region-Based Compression for Machine Vision

Md Eimran Hossain Eimon, Alena Krause, Ashan Perera, Juan Merlos, Hari Kalva, Velibor Adzic, Borko Furht

TL;DR

<3-5 sentence high-level summary> ROI-Packing tackles machine-vision compression by focusing on regions of interest, discarding nonessential data, and packing ROI content into a compact frame for VVC encoding. The method combines a region detector, convex-hull region merging, down-scaling, and bin-packing with metadata to enable accurate reassembly at the decoder without retraining end-task models. Evaluations across RGB and infrared datasets and two tasks show substantial bitrate reductions (up to 44.1%) and improved task accuracy at the same bitrate (up to 8.88%), outperforming MPEG's remote inference anchor. The approach offers practical, model-agnostic benefits for edge and remote inference scenarios and suggests extending ROI-Packing to video via intra-frame processing.

Abstract

This paper introduces ROI-Packing, an efficient image compression method tailored specifically for machine vision. By prioritizing regions of interest (ROI) critical to end-task accuracy and packing them efficiently while discarding less relevant data, ROI-Packing achieves significant compression efficiency without requiring retraining or fine-tuning of end-task models. Comprehensive evaluations across five datasets and two popular tasks-object detection and instance segmentation-demonstrate up to a 44.10% reduction in bitrate without compromising end-task accuracy, along with an 8.88 % improvement in accuracy at the same bitrate compared to the state-of-the-art Versatile Video Coding (VVC) codec standardized by the Moving Picture Experts Group (MPEG).

ROI-Packing: Efficient Region-Based Compression for Machine Vision

TL;DR

Abstract

ROI-Packing: Efficient Region-Based Compression for Machine Vision

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)