End-to-End Semantic Preservation in Text-Aware Image Compression Systems
Stefano Della Fiore, Alessandro Gnutti, Marco Dalai, Pierangelo Migliorati, Riccardo Leonardi
TL;DR
This work shifts image compression toward machine-oriented goals by introducing Text-Focused Image Compression (TFIC), which preserves text information for OCR within an end-to-end learned codec. TFIC couples a neural image codec with a downstream OCR module and optimizes a joint loss that includes distortion, rate, and OCR terms, achieving high OCR accuracy at very low bitrates and enabling on-device encoding where the encoder runs about half as long as OCR. Beyond text, the paper investigates semantic preservation under extreme compression using a DnCNN-based dequantization approach and an OCR-guided loss, showing that meaningful textual content can be recovered even from visually degraded signals, with substantial gains over baselines. These results bridge text-centric compression and general semantic preservation, offering practical pathways for machine-centered image coding in bandwidth-constrained settings and for downstream text understanding tasks.
Abstract
Traditional image compression methods aim to reconstruct images for human perception, prioritizing visual fidelity over task relevance. In contrast, Coding for Machines focuses on preserving information essential for automated understanding. Building on this principle, we present an end-to-end compression framework that retains text-specific features for Optical Character Recognition (OCR). The encoder operates at roughly half the computational cost of the OCR module, making it suitable for resource-limited devices. When on-device OCR is infeasible, images can be efficiently compressed and later decoded to recover textual content. Experiments show significant improvements in text extraction accuracy at low bitrates, even outperforming OCR on uncompressed images. We further extend this study to general-purpose encoders, exploring their capacity to preserve hidden semantics under extreme compression. Instead of optimizing for visual fidelity, we examine whether compact, visually degraded representations can retain recoverable meaning through learned enhancement and recognition modules. Results demonstrate that semantic information can persist despite severe compression, bridging text-oriented compression and general-purpose semantic preservation in machine-centered image coding.
