Table of Contents
Fetching ...

End-to-End Semantic Preservation in Text-Aware Image Compression Systems

Stefano Della Fiore, Alessandro Gnutti, Marco Dalai, Pierangelo Migliorati, Riccardo Leonardi

TL;DR

This work shifts image compression toward machine-oriented goals by introducing Text-Focused Image Compression (TFIC), which preserves text information for OCR within an end-to-end learned codec. TFIC couples a neural image codec with a downstream OCR module and optimizes a joint loss that includes distortion, rate, and OCR terms, achieving high OCR accuracy at very low bitrates and enabling on-device encoding where the encoder runs about half as long as OCR. Beyond text, the paper investigates semantic preservation under extreme compression using a DnCNN-based dequantization approach and an OCR-guided loss, showing that meaningful textual content can be recovered even from visually degraded signals, with substantial gains over baselines. These results bridge text-centric compression and general semantic preservation, offering practical pathways for machine-centered image coding in bandwidth-constrained settings and for downstream text understanding tasks.

Abstract

Traditional image compression methods aim to reconstruct images for human perception, prioritizing visual fidelity over task relevance. In contrast, Coding for Machines focuses on preserving information essential for automated understanding. Building on this principle, we present an end-to-end compression framework that retains text-specific features for Optical Character Recognition (OCR). The encoder operates at roughly half the computational cost of the OCR module, making it suitable for resource-limited devices. When on-device OCR is infeasible, images can be efficiently compressed and later decoded to recover textual content. Experiments show significant improvements in text extraction accuracy at low bitrates, even outperforming OCR on uncompressed images. We further extend this study to general-purpose encoders, exploring their capacity to preserve hidden semantics under extreme compression. Instead of optimizing for visual fidelity, we examine whether compact, visually degraded representations can retain recoverable meaning through learned enhancement and recognition modules. Results demonstrate that semantic information can persist despite severe compression, bridging text-oriented compression and general-purpose semantic preservation in machine-centered image coding.

End-to-End Semantic Preservation in Text-Aware Image Compression Systems

TL;DR

This work shifts image compression toward machine-oriented goals by introducing Text-Focused Image Compression (TFIC), which preserves text information for OCR within an end-to-end learned codec. TFIC couples a neural image codec with a downstream OCR module and optimizes a joint loss that includes distortion, rate, and OCR terms, achieving high OCR accuracy at very low bitrates and enabling on-device encoding where the encoder runs about half as long as OCR. Beyond text, the paper investigates semantic preservation under extreme compression using a DnCNN-based dequantization approach and an OCR-guided loss, showing that meaningful textual content can be recovered even from visually degraded signals, with substantial gains over baselines. These results bridge text-centric compression and general semantic preservation, offering practical pathways for machine-centered image coding in bandwidth-constrained settings and for downstream text understanding tasks.

Abstract

Traditional image compression methods aim to reconstruct images for human perception, prioritizing visual fidelity over task relevance. In contrast, Coding for Machines focuses on preserving information essential for automated understanding. Building on this principle, we present an end-to-end compression framework that retains text-specific features for Optical Character Recognition (OCR). The encoder operates at roughly half the computational cost of the OCR module, making it suitable for resource-limited devices. When on-device OCR is infeasible, images can be efficiently compressed and later decoded to recover textual content. Experiments show significant improvements in text extraction accuracy at low bitrates, even outperforming OCR on uncompressed images. We further extend this study to general-purpose encoders, exploring their capacity to preserve hidden semantics under extreme compression. Instead of optimizing for visual fidelity, we examine whether compact, visually degraded representations can retain recoverable meaning through learned enhancement and recognition modules. Results demonstrate that semantic information can persist despite severe compression, bridging text-oriented compression and general-purpose semantic preservation in machine-centered image coding.

Paper Structure

This paper contains 17 sections, 7 equations, 10 figures, 1 table.

Figures (10)

  • Figure 1: Comparison of high-level frameworks: no compression, conventional compression, and our proposed TFIC.
  • Figure 2: High-level architectural framework of TFIC. The main image codec is composed by an encoder $g_a$ that compresses the input image into a latent representation, which is quantized and transmitted. The decoder $g_s$ reconstructs the image that is subsequently processed by the OCR module for text extraction.
  • Figure 3: Rate vs. OCR performance curves for images that are uncompressed, decoded with the pre-trained MSE-based codec, and decoded using the proposed TFIC.
  • Figure 4: Visual comparison of reconstructed images. Although the base codec preserves more global details, our method retains critical textual regions that yield superior OCR performance.
  • Figure 5: Rate vs. PSNR performance curves for the base codec and the proposed TFIC.
  • ...and 5 more figures