Table of Contents
Fetching ...

Joint Data and Semantics Lossy Compression: Nonasymptotic Converse Bounds and Second-Order Asymptotics

Huiyuan Yang, Yuxuan Shi, Shuo Shao, Xiaojun Yuan

TL;DR

This work addresses the problem of jointly compressing data and its semantics (JDSLC) under finite blocklengths. It derives general nonasymptotic converse bounds using distortion-tilted information and establishes a tight second-order bound for stationary memoryless sources, with dispersion given by \\tilde{\\mathcal{V}}(d_s,d_x) and a Gaussian-approximation term $\\sqrt{k \\tilde{\\mathcal{V}}(d_s,d_x)} Q^{-1}(\\epsilon)$. The authors specialize to Erased Fair Coin Flips (EFCF), obtaining an explicit semantic rate-distortion function and corresponding nonasymptotic converse and achievability bounds, complemented by numerical results showing the accuracy of the second-order approximation at practical blocklengths. Collectively, the results provide practical finite-blocklength limits for semantic-aware compression and guidance for designing encoders that jointly preserve data and semantics under delay constraints.

Abstract

This paper studies the joint data and semantics lossy compression problem, i.e., an extension of the hidden lossy source coding problem that entails recovering both the hidden and observable sources. We aim to study the nonasymptotic and second-order properties of this problem, especially the converse aspect. Specifically, we begin by deriving general nonasymptotic converse bounds valid for general sources and distortion measures, utilizing properties of distortion-tilted information. Subsequently, a second-order converse bound is derived under the standard block coding setting through asymptotic analysis of the nonasymptotic bounds. This bound is tight since it coincides with a known second-order achievability bound. We then examine the case of erased fair coin flips (EFCF), providing its specific nonasymptotic achievability and converse bounds. Numerical results under the EFCF case demonstrate that our second-order asymptotic approximation effectively approximates the optimum rate at given blocklengths.

Joint Data and Semantics Lossy Compression: Nonasymptotic Converse Bounds and Second-Order Asymptotics

TL;DR

This work addresses the problem of jointly compressing data and its semantics (JDSLC) under finite blocklengths. It derives general nonasymptotic converse bounds using distortion-tilted information and establishes a tight second-order bound for stationary memoryless sources, with dispersion given by \\tilde{\\mathcal{V}}(d_s,d_x) and a Gaussian-approximation term . The authors specialize to Erased Fair Coin Flips (EFCF), obtaining an explicit semantic rate-distortion function and corresponding nonasymptotic converse and achievability bounds, complemented by numerical results showing the accuracy of the second-order approximation at practical blocklengths. Collectively, the results provide practical finite-blocklength limits for semantic-aware compression and guidance for designing encoders that jointly preserve data and semantics under delay constraints.

Abstract

This paper studies the joint data and semantics lossy compression problem, i.e., an extension of the hidden lossy source coding problem that entails recovering both the hidden and observable sources. We aim to study the nonasymptotic and second-order properties of this problem, especially the converse aspect. Specifically, we begin by deriving general nonasymptotic converse bounds valid for general sources and distortion measures, utilizing properties of distortion-tilted information. Subsequently, a second-order converse bound is derived under the standard block coding setting through asymptotic analysis of the nonasymptotic bounds. This bound is tight since it coincides with a known second-order achievability bound. We then examine the case of erased fair coin flips (EFCF), providing its specific nonasymptotic achievability and converse bounds. Numerical results under the EFCF case demonstrate that our second-order asymptotic approximation effectively approximates the optimum rate at given blocklengths.
Paper Structure (16 sections, 12 theorems, 81 equations, 3 figures)

This paper contains 16 sections, 12 theorems, 81 equations, 3 figures.

Key Result

Proposition 1

(Yang2024Joint): $\tilde{\mathcal{V}}(d_s,d_x)$ can be written as where $\textrm{Var}\left[U|V\right] \triangleq \mathbb{E}\left[(U - \mathbb{E}\left[U|V\right])^2\right]$.

Figures (3)

  • Figure 1: Joint data and semantics lossy compression in the nonasymptotic regime.
  • Figure 2: Rate-distortion function of the EFCF case with $\delta = 0.2$.
  • Figure 3: Rate-blocklength trade-off in the erased fair coin flips case with $\delta = 0.2$ and $\epsilon = 0.1$. Note that $(d_x, d_s) = (1.36\delta, 0.88\delta)$ and $(d_x, d_s) = (1.28\delta, 1.12\delta)$ share the same value of $R(d_s,d_x)$.

Theorems & Definitions (24)

  • Definition 1
  • Definition 2
  • Proposition 1
  • Theorem 1
  • proof : Proof
  • Corollary 1
  • proof : Proof
  • Corollary 2
  • proof : Proof
  • Theorem 2
  • ...and 14 more