Importance Matching Lemma for Lossy Compression with Side Information

Buu Phan; Ashish Khisti; Christos Louizos

Importance Matching Lemma for Lossy Compression with Side Information

Buu Phan, Ashish Khisti, Christos Louizos

TL;DR

The importance matching lemma is introduced, which is a finite proposal counterpart of the recently introduced Poisson matching lemma and provides a new coding scheme for distributed lossy compression with side information at the decoder.

Abstract

We propose two extensions to existing importance sampling based methods for lossy compression. First, we introduce an importance sampling based compression scheme that is a variant of ordered random coding (Theis and Ahmed, 2022) and is amenable to direct evaluation of the achievable compression rate for a finite number of samples. Our second and major contribution is the importance matching lemma, which is a finite proposal counterpart of the recently introduced Poisson matching lemma (Li and Anantharam, 2021). By integrating with deep learning, we provide a new coding scheme for distributed lossy compression with side information at the decoder. We demonstrate the effectiveness of the proposed scheme through experiments involving synthetic Gaussian sources, distributed image compression with MNIST and vertical federated learning with CIFAR-10.

Importance Matching Lemma for Lossy Compression with Side Information

TL;DR

Abstract

Paper Structure (48 sections, 15 theorems, 179 equations, 14 figures, 8 tables)

This paper contains 48 sections, 15 theorems, 179 equations, 14 figures, 8 tables.

INTRODUCTION
COMMUNICATION-EFFICIENT IMPORTANCE SAMPLING
Problem Setup
Coding Scheme
Beyond i.i.d. samples
IMPORTANCE MATCHING LEMMAS
Importance Matching Lemma
Conditional Importance Matching Lemma.
LOSSY COMPRESSION WITH SIDE-INFORMATION
Problem Setup
Coding Scheme
Decision Feedback Based Scheme
EXPERIMENTS
Synthetic Gaussian Source
Distributed Image Compression
...and 33 more sections

Key Result

Theorem 1

Given $(X,Y) \sim p_{X,Y}$, and $N, K$ as in the scheme in Sec. coding_scheme, we have that: where ${\bf \lambda} = (\lambda_1, \ldots, \lambda_N)$ is defined via eq:lam-i, $D({.}|| .)$ is the KL divergence, ${\bf u} = \left(1/N,\ldots, 1/N\right)$ is associated with the uniform distribution and $\delta = 1 + \log e/e$ is a constant. Furthermore, where $\Delta:=\Delta(p_{X,Y})$ is a constant def

Figures (14)

Figure 1: (Left) Overview of IML: Alice and Bob independently sample $Y_{U_p},Y_{U_q}$ by applying the Gumbel-max trick on the shared randomness. (Middle) The empirical mismatch probability with respect to the Wasserstein-2 distance $W_2(p_Y, q_Y)$, where $p_Y{=}\mathcal{N}(m,1)$, $q_Y{=}\mathcal{N}(-m,1)$ and $m \in [0,\infty)$. (Right) Mechanism of IML: each party scales their respective importance weights function $\frac{p_i(y)}{\psi_i(y)}$ and $\frac{q_i(y)}{\psi_i(y)}$ until one point $\{S_i,Y_i\}$ falls on the curve. Top and bottom figures show the matching and mismatching case respectively.
Figure 2: (Left) Source coding with side information at the decoder with conditional IML. (Right) Decoding mechanism: the encoder scales $\frac{p_{W|V}(w|v)}{\psi_W(w)}$ and selects $W_{U_p}$(blue circle). Left sub-figure: the decoder selects incorrect indices by purely scaling $\frac{p_{W|T}(w|t)}{\psi_W(w)}$ (equivalently, rate $R{=}0$). Right sub-figure: we generate extra one-bit information $l_i$ for each codeword index by randomly marking it either a triangle ($l_i{=}0$) or a circle ($l_i{=}1$). Upon receiving $l_{U_P}{=}1$ from the encoder, the decoder eliminates all indices marked by triangle and correctly decode the index among the circles.
Figure 3: Left: Empirical matching probabilities with different target distribution and number of proposals. Right: effects of compressing multiple samples jointly on the matching probability (Best view in screen).
Figure 4: Analysis and rate-distortion performance of different IML schemes. (Best view in screen)
Figure 5: Distributed Image Compression with MNIST. In (a), the orange curve is IML without feedback and the performance is restricted to 8 bits due to limited computing resources. In (b), the gray area denotes the side-information sent to the decoder, which is 0 at the encoder side.
...and 9 more figures

Theorems & Definitions (17)

Theorem 1
Proposition 1
Theorem 2
Theorem 3
Proposition 2
Remark 1
Remark 2
Theorem : Restatement of Theorem \ref{['thm:rate']} in main paper
Proposition 3
Proposition 4
...and 7 more

Importance Matching Lemma for Lossy Compression with Side Information

TL;DR

Abstract

Importance Matching Lemma for Lossy Compression with Side Information

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (14)

Theorems & Definitions (17)