An Information Theoretic Approach to Interaction-Grounded Learning

Xiaoyan Hu; Farzan Farnia; Ho-fung Leung

An Information Theoretic Approach to Interaction-Grounded Learning

Xiaoyan Hu, Farzan Farnia, Ho-fung Leung

TL;DR

This work tackles reinforcement learning where the agent must infer latent binary rewards from noisy feedback by enforcing conditional independence of the feedback from context-action given the latent reward. It introduces Variational Information-based IGL (VI-IGL), which optimizes a min-max objective that minimizes $I(Y;X,A|R_\psi)$ while regularizing with $\beta\,I(X,A;R_\psi)$, and solves it via a variational representation of mutual information. The framework extends naturally to general $f$-information (f-VI-IGL), enabling flexible divergence choices and robust estimation using function-approximation for $G$ and $T$. Empirical results on a number-guessing task with noisy feedback show VI-IGL improves robustness and performance over prior IGL methods, highlighting its potential for efficient, information-driven learning under uncertain rewards.

Abstract

Reinforcement learning (RL) problems where the learner attempts to infer an unobserved reward from some feedback variables have been studied in several recent papers. The setting of Interaction-Grounded Learning (IGL) is an example of such feedback-based RL tasks where the learner optimizes the return by inferring latent binary rewards from the interaction with the environment. In the IGL setting, a relevant assumption used in the RL literature is that the feedback variable $Y$ is conditionally independent of the context-action $(X,A)$ given the latent reward $R$. In this work, we propose Variational Information-based IGL (VI-IGL) as an information-theoretic method to enforce the conditional independence assumption in the IGL-based RL problem. The VI-IGL framework learns a reward decoder using an information-based objective based on the conditional mutual information (MI) between $(X,A)$ and $Y$. To estimate and optimize the information-based terms for the continuous random variables in the RL problem, VI-IGL leverages the variational representation of mutual information to obtain a min-max optimization problem. Also, we extend the VI-IGL framework to general $f$-Information measures leading to the generalized $f$-VI-IGL framework for the IGL-based RL problems. We present numerical results on several reinforcement learning settings indicating an improved performance compared to the existing IGL-based RL algorithm.

An Information Theoretic Approach to Interaction-Grounded Learning

TL;DR

while regularizing with

, and solves it via a variational representation of mutual information. The framework extends naturally to general

-information (f-VI-IGL), enabling flexible divergence choices and robust estimation using function-approximation for

and

. Empirical results on a number-guessing task with noisy feedback show VI-IGL improves robustness and performance over prior IGL methods, highlighting its potential for efficient, information-driven learning under uncertain rewards.

Abstract

is conditionally independent of the context-action

given the latent reward

. In this work, we propose Variational Information-based IGL (VI-IGL) as an information-theoretic method to enforce the conditional independence assumption in the IGL-based RL problem. The VI-IGL framework learns a reward decoder using an information-based objective based on the conditional mutual information (MI) between

and

. To estimate and optimize the information-based terms for the continuous random variables in the RL problem, VI-IGL leverages the variational representation of mutual information to obtain a min-max optimization problem. Also, we extend the VI-IGL framework to general

-Information measures leading to the generalized

-VI-IGL framework for the IGL-based RL problems. We present numerical results on several reinforcement learning settings indicating an improved performance compared to the existing IGL-based RL algorithm.

Paper Structure (30 sections, 7 theorems, 44 equations, 2 figures, 10 tables, 2 algorithms)

This paper contains 30 sections, 7 theorems, 44 equations, 2 figures, 10 tables, 2 algorithms.

Introduction
Related Work
Preliminaries
Interaction-Grounded Learning (IGL)
(TEXT-)Conditional Mutual Information
Variational Information-Based IGL
Minimizing Conditional MI with Regularization
Leveraging Variational Representation to Solve Information-based Objective
The TEXT-VI-IGL Algorithm
The Extended TEXT-Variational Information-based IGL
Algorithm Description
Empirical Results
Robustness to Noises
Necessity of Regularization
Ablation Experiments
...and 15 more sections

Key Result

Theorem 4.1

Objective (obj) is equivalent to the following optimization problem: where $G\in{\mathcal{G}}:{\mathcal{X}}\times{\mathcal{A}}\times{\mathcal{Y}}\times\{0,1\}\to{\mathbb{R}}$ and $T\in{\mathcal{T}}:{\mathcal{X}}\times{\mathcal{A}}\times\{0,1\}\to{\mathbb{R}}$ are two function classes.

Figures (2)

Figure 1: Policy accuracy under different noise level: Our VI-IGL algorithm outperforms batch E2Gpmlr-v139-xie21e in all noisy environments and across all noise levels. The results are averaged over 16 trials.
Figure 2: Policy accuracy for different $\beta$: All noisy settings have level$=0.1$. The optimal selections are marked beside the value. The results show the necessity of regularization. The results are averaged over 16 trials.

Theorems & Definitions (13)

Theorem 4.1: VI-IGL optimization problem
Theorem 4.2: Sample complexity
Theorem 4.3: Regularization (almost) ensures conditional independence
Proposition 4.4: Donsker-Varadhan representation c52c83e0b02c4746a5ea29b5cd44fd00
Theorem 5.1: $f$-VI-IGL optimization problem
proof
proof
Definition B.1: $\epsilon$-covering
Remark B.2
proof
...and 3 more

An Information Theoretic Approach to Interaction-Grounded Learning

TL;DR

Abstract

An Information Theoretic Approach to Interaction-Grounded Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (13)