T-MLA: A Targeted Multiscale Log--Exponential Attack Framework for Neural Image Compression
Nikolay I. Kalmykov, Razan Dibo, Kaiyu Shen, Xu Zhonghan, Anh-Huy Phan, Yipeng Liu, Ivan Oseledets
TL;DR
This work identifies security vulnerabilities in neural image compression by exploiting the multiscale frequency structure via a wavelet-domain attack. The authors introduce T-MLA, a targeted multiscale log--exp adversarial framework that injects nonlinear perturbations across wavelet subbands with adaptive budgets to maximize post-codec distortion while preserving input perceptual quality. Through experiments on Kodak, CLIC, and DIV2K with multiple NIC architectures, T-MLA achieves substantial reconstruction degradation under tight stealth constraints and reveals a entropy-dependent vulnerability pattern, prompting calls for wavelet-aware defenses. The findings highlight critical security considerations for generative compression pipelines and motivate future work on robustness, black-box and universal attacks, and defense strategies across broader codecs and modalities.
Abstract
Neural image compression (NIC) has become the state-of-the-art for rate-distortion performance, yet its security vulnerabilities remain significantly less understood than those of classifiers. Existing adversarial attacks on NICs are often naive adaptations of pixel-space methods, overlooking the unique, structured nature of the compression pipeline. In this work, we propose a more advanced class of vulnerabilities by introducing T-MLA, the first targeted multiscale log--exponential attack framework. Our approach crafts adversarial perturbations in the wavelet domain by directly targeting the quality of the attacked and reconstructed images. This allows for a principled, offline attack where perturbations are strategically confined to specific wavelet subbands, maximizing distortion while ensuring perceptual stealth. Extensive evaluation across multiple state-of-the-art NIC architectures on standard image compression benchmarks reveals a large drop in reconstruction quality while the perturbations remain visually imperceptible. Our findings reveal a critical security flaw at the core of generative and content delivery pipelines.
