Disentangled (Un)Controllable Features
Jacob E. Kooi, Mark Hoogendoorn, Vincent François-Lavet
TL;DR
This work introduces a disentangled latent representation for high-dimensional MDPs by partitioning the latent state into controllable $z^c$ and uncontrollable $z^u$ components. It combines an action-conditioned forward predictor for $z^c$, a state-only forward predictor for $z^u$, a contrastive loss to avoid representation collapse, and an adversarial loss to minimize information leakage from $z^u$ into $z^c$, enabling planning directly in the controllable latent. The approach is validated across three environment types, showing interpretable latent separation and competitive downstream learning performance, with planning in the controllable subspace providing practical advantages in unseen mazes. These results point toward interpretable, task-relevant latent representations that support planning and potential causal reasoning in RL, with emphasis on robust disentanglement and generalization to complex environments.
Abstract
In the context of MDPs with high-dimensional states, downstream tasks are predominantly applied on a compressed, low-dimensional representation of the original input space. A variety of learning objectives have therefore been used to attain useful representations. However, these representations usually lack interpretability of the different features. We present a novel approach that is able to disentangle latent features into a controllable and an uncontrollable partition. We illustrate that the resulting partitioned representations are easily interpretable on three types of environments and show that, in a distribution of procedurally generated maze environments, it is feasible to interpretably employ a planning algorithm in the isolated controllable latent partition.
