Table of Contents
Fetching ...

Scalable Image Coding for Humans and Machines Using Feature Fusion Network

Takahiro Shindo, Taiju Watanabe, Yui Tatsumi, Hiroshi Watanabe

TL;DR

This work tackles scalable image coding for humans and machines by pairing a machine-focused LIC (SA-ICM) with an additional-information LIC for human viewing and fusing their features via a novel Feature Fusion Network. The approach is designed to be robust to changes in recognition models, avoiding optimization for a single task. Experiments on COCO demonstrate superior performance at low bitrates compared with VVC and standard LIC baselines, validating the benefit of feature fusion. A key contribution is showing that the additional-information channel count can be reduced (via parameter m) without sacrificing human-decoding quality, enabling more efficient edge deployments.

Abstract

As image recognition models become more prevalent, scalable coding methods for machines and humans gain more importance. Applications of image recognition models include traffic monitoring and farm management. In these use cases, the scalable coding method proves effective because the tasks require occasional image checking by humans. Existing image compression methods for humans and machines meet these requirements to some extent. However, these compression methods are effective solely for specific image recognition models. We propose a learning-based scalable image coding method for humans and machines that is compatible with numerous image recognition models. We combine an image compression model for machines with a compression model, providing additional information to facilitate image decoding for humans. The features in these compression models are fused using a feature fusion network to achieve efficient image compression. Our method's additional information compression model is adjusted to reduce the number of parameters by enabling combinations of features of different sizes in the feature fusion network. Our approach confirms that the feature fusion network efficiently combines image compression models while reducing the number of parameters. Furthermore, we demonstrate the effectiveness of the proposed scalable coding method by evaluating the image compression performance in terms of decoded image quality and bitrate.

Scalable Image Coding for Humans and Machines Using Feature Fusion Network

TL;DR

This work tackles scalable image coding for humans and machines by pairing a machine-focused LIC (SA-ICM) with an additional-information LIC for human viewing and fusing their features via a novel Feature Fusion Network. The approach is designed to be robust to changes in recognition models, avoiding optimization for a single task. Experiments on COCO demonstrate superior performance at low bitrates compared with VVC and standard LIC baselines, validating the benefit of feature fusion. A key contribution is showing that the additional-information channel count can be reduced (via parameter m) without sacrificing human-decoding quality, enabling more efficient edge deployments.

Abstract

As image recognition models become more prevalent, scalable coding methods for machines and humans gain more importance. Applications of image recognition models include traffic monitoring and farm management. In these use cases, the scalable coding method proves effective because the tasks require occasional image checking by humans. Existing image compression methods for humans and machines meet these requirements to some extent. However, these compression methods are effective solely for specific image recognition models. We propose a learning-based scalable image coding method for humans and machines that is compatible with numerous image recognition models. We combine an image compression model for machines with a compression model, providing additional information to facilitate image decoding for humans. The features in these compression models are fused using a feature fusion network to achieve efficient image compression. Our method's additional information compression model is adjusted to reduce the number of parameters by enabling combinations of features of different sizes in the feature fusion network. Our approach confirms that the feature fusion network efficiently combines image compression models while reducing the number of parameters. Furthermore, we demonstrate the effectiveness of the proposed scalable coding method by evaluating the image compression performance in terms of decoded image quality and bitrate.
Paper Structure (12 sections, 8 equations, 7 figures, 2 tables)

This paper contains 12 sections, 8 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Image processing flow of the scalable image coding method. (a): Conventional method aimed for a specific image recognition model. (b): Proposed method compatible for various image recognition models.
  • Figure 2: Examples of the mask image. (a) : Original image in COCO dataset. (b) : Mask image generated using Segment Anything Model.
  • Figure 3: The processing flow of the proposed scalable image coding method. The LIC model in the upper row is SA-ICM, which compresses images for machines. The LIC model in the lower is an additional information compression model, which converts coded images for machines into images for humans.
  • Figure 4: Model structure of the feature fusion network. The purple and pink squares represent the features of the image compression model for machines and that of the additional information compression model, respectively.
  • Figure 5: Example of compressed images for humans and machines. (a) : original image. (b) : decoded images for machines using SA-ICM. (c) : decoded images for human vision using additional information.
  • ...and 2 more figures