Joint Data and Semantics Lossy Compression: Nonasymptotic Converse Bounds and Second-Order Asymptotics
Huiyuan Yang, Yuxuan Shi, Shuo Shao, Xiaojun Yuan
TL;DR
This work addresses the problem of jointly compressing data and its semantics (JDSLC) under finite blocklengths. It derives general nonasymptotic converse bounds using distortion-tilted information and establishes a tight second-order bound for stationary memoryless sources, with dispersion given by \\tilde{\\mathcal{V}}(d_s,d_x) and a Gaussian-approximation term $\\sqrt{k \\tilde{\\mathcal{V}}(d_s,d_x)} Q^{-1}(\\epsilon)$. The authors specialize to Erased Fair Coin Flips (EFCF), obtaining an explicit semantic rate-distortion function and corresponding nonasymptotic converse and achievability bounds, complemented by numerical results showing the accuracy of the second-order approximation at practical blocklengths. Collectively, the results provide practical finite-blocklength limits for semantic-aware compression and guidance for designing encoders that jointly preserve data and semantics under delay constraints.
Abstract
This paper studies the joint data and semantics lossy compression problem, i.e., an extension of the hidden lossy source coding problem that entails recovering both the hidden and observable sources. We aim to study the nonasymptotic and second-order properties of this problem, especially the converse aspect. Specifically, we begin by deriving general nonasymptotic converse bounds valid for general sources and distortion measures, utilizing properties of distortion-tilted information. Subsequently, a second-order converse bound is derived under the standard block coding setting through asymptotic analysis of the nonasymptotic bounds. This bound is tight since it coincides with a known second-order achievability bound. We then examine the case of erased fair coin flips (EFCF), providing its specific nonasymptotic achievability and converse bounds. Numerical results under the EFCF case demonstrate that our second-order asymptotic approximation effectively approximates the optimum rate at given blocklengths.
