Rethinking Image Compression on the Web with Generative AI
Shayan Ali Hassan, Danish Humair, Ihsan Ayyub Qazi, Zafar Ayyub Qazi
TL;DR
The study investigates a generative AI–driven approach to web image compression by reconstructing images at the edge using a Pseudo-Lossy Compression framework that transmits semantic and structural conditioning inputs (text prompts, Canny edges, color palettes, and in-painting masks) to a text-to-image model. This method achieves substantial bandwidth savings—up to 99.8% in best cases and 92.6% on average—while preserving perceptual content as measured by VGG16 embeddings and supported by a user study. The work quantifies the bandwidth–similarity trade-offs across multiple conditioning configurations, demonstrates the importance of preserving salient features, and discusses practical considerations for real-time deployment, ethics, and standards. Overall, the approach offers a promising direction for reducing web-image data transfer without severely compromising meaning or structure, with potential implications for internet affordability and infrastructure costs.
Abstract
The rapid growth of the Internet, driven by social media, web browsing, and video streaming, has made images central to the Web experience, resulting in significant data transfer and increased webpage sizes. Traditional image compression methods, while reducing bandwidth, often degrade image quality. This paper explores a novel approach using generative AI to reconstruct images at the edge or client-side. We develop a framework that leverages text prompts and provides additional conditioning inputs like Canny edges and color palettes to a text-to-image model, achieving up to 99.8% bandwidth savings in the best cases and 92.6% on average, while maintaining high perceptual similarity. Empirical analysis and a user study show that our method preserves image meaning and structure more effectively than traditional compression methods, offering a promising solution for reducing bandwidth usage and improving Internet affordability with minimal degradation in image quality.
