Learning in Factored Domains with Information-Constrained Visual Representations
Tailia Malloy, Miao Liu, Matthew D. Riemer, Tim Klinger, Gerald Tesauro, Chris R. Sims
TL;DR
The paper addresses how humans achieve rapid learning in visually rich environments by leveraging compressed, disentangled representations and factored task structure. It proposes a model that couples a modified $β$-VAE with a hypothesis-generation framework to infer a factored MDP from visual input, enabling factored rewards to guide learning. In a contextual bandit task with CelebA faces, the authors show that smaller latent dimensions yield faster learning at the cost of reconstruction fidelity, illustrating a speed–accuracy trade-off in latent space design. The work contributes a behavioral perspective on disentanglement, emphasizing hypothesis-driven use of latent representations to improve generalization and robustness in learning.
Abstract
Humans learn quickly even in tasks that contain complex visual information. This is due in part to the efficient formation of compressed representations of visual information, allowing for better generalization and robustness. However, compressed representations alone are insufficient for explaining the high speed of human learning. Reinforcement learning (RL) models that seek to replicate this impressive efficiency may do so through the use of factored representations of tasks. These informationally simplistic representations of tasks are similarly motivated as the use of compressed representations of visual information. Recent studies have connected biological visual perception to disentangled and compressed representations. This raises the question of how humans learn to efficiently represent visual information in a manner useful for learning tasks. In this paper we present a model of human factored representation learning based on an altered form of a $β$-Variational Auto-encoder used in a visual learning task. Modelling results demonstrate a trade-off in the informational complexity of model latent dimension spaces, between the speed of learning and the accuracy of reconstructions.
