Information Subtraction: Learning Representations for Conditional Entropy

Keng Hou Leong; Yuxuan Xiu; Wai Kin; Chan

Information Subtraction: Learning Representations for Conditional Entropy

Keng Hou Leong, Yuxuan Xiu, Wai Kin, Chan

TL;DR

This work tackles representing conditional information terms such as $H(Y|X)$ and $I(X;Y|W)$ in continuous settings, where prior conditional sampling methods falter. It introduces Information Subtraction, a generative framework that learns representations $Z$ by maximizing $I(Y; X, Z)$ while minimizing $I(X; Z)$ with a trade-off $\lambda$, enabling selective inclusion of information about $Y$ and subtraction of information about $X$. It supports decomposing information sectors in the information diagram by iteratively applying the subtraction to obtain sector-specific representations. The approach is demonstrated on synthetic and real data, showing benefits for fair learning and domain generalization, and the authors provide code for replication.

Abstract

The representations of conditional entropy and conditional mutual information are significant in explaining the unique effects among variables. While previous studies based on conditional contrastive sampling have effectively removed information regarding discrete sensitive variables, they have not yet extended their scope to continuous cases. This paper introduces Information Subtraction, a framework designed to generate representations that preserve desired information while eliminating the undesired. We implement a generative-based architecture that outputs these representations by simultaneously maximizing an information term and minimizing another. With its flexibility in disentangling information, we can iteratively apply Information Subtraction to represent arbitrary information components between continuous variables, thereby explaining the various relationships that exist between them. Our results highlight the representations' ability to provide semantic features of conditional entropy. By subtracting sensitive and domain-specific information, our framework demonstrates effective performance in fair learning and domain generalization. The code for this paper is available at https://github.com/jh-liang/Information-Subtraction

Information Subtraction: Learning Representations for Conditional Entropy

TL;DR

This work tackles representing conditional information terms such as

and

in continuous settings, where prior conditional sampling methods falter. It introduces Information Subtraction, a generative framework that learns representations

by maximizing

while minimizing

with a trade-off

, enabling selective inclusion of information about

and subtraction of information about

. It supports decomposing information sectors in the information diagram by iteratively applying the subtraction to obtain sector-specific representations. The approach is demonstrated on synthetic and real data, showing benefits for fair learning and domain generalization, and the authors provide code for replication.

Abstract

Paper Structure (21 sections, 20 equations, 8 figures, 11 tables, 2 algorithms)

This paper contains 21 sections, 20 equations, 8 figures, 11 tables, 2 algorithms.

Introduction
Problem Descriptions
Related Works
Architecture
Generator
Discriminator
Maximizing Conditional Mutual Information
Application Scenarios and Results
Representing Relationships
Information Subtraction
Synthetic case for Fair Learning
Real case for Fair Learning: Adult Income Data
Real case for Domain Generalization: Cover Type Data
Conclusion
Appendix: Implementation Details for Part \ref{['s5_1']} Representation Relationships
...and 6 more sections

Figures (8)

Figure 1: Venn Diagram of entropy $H(Y)$, mutual information $I(Y; X)$, conditional entropy $H(Y|X)$, and conditional mutual information $I(Y; W|X)$, between variables $V$, $W$, $X$, $Y$ in the scenarios of time series and image identifications.
Figure 2: The Venn Diagram illustrates the information provided to $Y$ by $X$ (red), $Z$ (blue), and $\{X, Z\}$ (black). Our objective is to generate $Z$ that represents the shaded region.
Figure 3: The illustration of the architecture. $Y$ is the target variable, $X$ is the conditional variable. Generator A inputs $Y$ and outputs its representation $Z$, which is then fed into Discriminator B and C to estimate the loss function to be back-propagated.
Figure 4: (a) The relationship diagram between four species. (b) The dynamics of Lotka–Volterra model. (c) The dynamics of $S$ and $G$ over $t$. (c) The dynamics of $Z$ and $G$ over $t$. (c) The dynamics of $Z$ and $S$ over $t$.
Figure 5: Information Subtraction.
...and 3 more figures

Information Subtraction: Learning Representations for Conditional Entropy

TL;DR

Abstract

Information Subtraction: Learning Representations for Conditional Entropy

Authors

TL;DR

Abstract

Table of Contents

Figures (8)