Table of Contents
Fetching ...

Learning in Factored Domains with Information-Constrained Visual Representations

Tailia Malloy, Miao Liu, Matthew D. Riemer, Tim Klinger, Gerald Tesauro, Chris R. Sims

TL;DR

The paper addresses how humans achieve rapid learning in visually rich environments by leveraging compressed, disentangled representations and factored task structure. It proposes a model that couples a modified $β$-VAE with a hypothesis-generation framework to infer a factored MDP from visual input, enabling factored rewards to guide learning. In a contextual bandit task with CelebA faces, the authors show that smaller latent dimensions yield faster learning at the cost of reconstruction fidelity, illustrating a speed–accuracy trade-off in latent space design. The work contributes a behavioral perspective on disentanglement, emphasizing hypothesis-driven use of latent representations to improve generalization and robustness in learning.

Abstract

Humans learn quickly even in tasks that contain complex visual information. This is due in part to the efficient formation of compressed representations of visual information, allowing for better generalization and robustness. However, compressed representations alone are insufficient for explaining the high speed of human learning. Reinforcement learning (RL) models that seek to replicate this impressive efficiency may do so through the use of factored representations of tasks. These informationally simplistic representations of tasks are similarly motivated as the use of compressed representations of visual information. Recent studies have connected biological visual perception to disentangled and compressed representations. This raises the question of how humans learn to efficiently represent visual information in a manner useful for learning tasks. In this paper we present a model of human factored representation learning based on an altered form of a $β$-Variational Auto-encoder used in a visual learning task. Modelling results demonstrate a trade-off in the informational complexity of model latent dimension spaces, between the speed of learning and the accuracy of reconstructions.

Learning in Factored Domains with Information-Constrained Visual Representations

TL;DR

The paper addresses how humans achieve rapid learning in visually rich environments by leveraging compressed, disentangled representations and factored task structure. It proposes a model that couples a modified -VAE with a hypothesis-generation framework to infer a factored MDP from visual input, enabling factored rewards to guide learning. In a contextual bandit task with CelebA faces, the authors show that smaller latent dimensions yield faster learning at the cost of reconstruction fidelity, illustrating a speed–accuracy trade-off in latent space design. The work contributes a behavioral perspective on disentanglement, emphasizing hypothesis-driven use of latent representations to improve generalization and robustness in learning.

Abstract

Humans learn quickly even in tasks that contain complex visual information. This is due in part to the efficient formation of compressed representations of visual information, allowing for better generalization and robustness. However, compressed representations alone are insufficient for explaining the high speed of human learning. Reinforcement learning (RL) models that seek to replicate this impressive efficiency may do so through the use of factored representations of tasks. These informationally simplistic representations of tasks are similarly motivated as the use of compressed representations of visual information. Recent studies have connected biological visual perception to disentangled and compressed representations. This raises the question of how humans learn to efficiently represent visual information in a manner useful for learning tasks. In this paper we present a model of human factored representation learning based on an altered form of a -Variational Auto-encoder used in a visual learning task. Modelling results demonstrate a trade-off in the informational complexity of model latent dimension spaces, between the speed of learning and the accuracy of reconstructions.
Paper Structure (10 sections, 4 equations, 4 figures)

This paper contains 10 sections, 4 equations, 4 figures.

Figures (4)

  • Figure 1: Example of the RL$\beta$-VAE model forming a reconstruction and predicted reward.
  • Figure 2: Left: Model pre-training reconstruction loss by training epoch, lower is better, color indicates latent dimension size. Middle: Contextual bandit training for 1000 runs of model accuracy by trail means (dots) are fit to a logarithmic function (lines). Right: Representation difference in mean-squared error between images containing hats, glasses, and both, compared to wearing neither.
  • Figure 3: Examples of face images with either eyeglasses or hats from the celebA dataset liu2015faceattributes.
  • Figure 4: Example a dynamic Bayesian network defined by one hypothesized scope. An example stimuli with latent representation and factored reward function. Note that the hypothetical DBN describes the transition function which is not used for the contextual bandit task.