Table of Contents
Fetching ...

Unsupervised Real-Time Control through Variational Empowerment

Maximilian Karl, Maximilian Soelch, Philip Becker-Ehmck, Djalel Benbouzid, Patrick van der Smagt, Justin Bayer

TL;DR

This work addresses unsupervised control in high-dimensional continuous dynamics by introducing a scalable variational bound on empowerment, the mutual information between actions and next states given the current state $E = I(A; S'\mid S)$. It develops an amortised, model-based framework that jointly learns a policy, a planning distribution, and a variational posterior, and leverages differentiable dynamics learned via Deep Variational Bayes Filters (DVBF) when needed. The approach enables real-time control by evaluating the empowerment bound with fixed, differentiable computations and extends naturally to $n$-step empowerment for longer-horizon planning. Empirical results across pendulum, ball-in-box, multi-ball, and biped balancing tasks demonstrate intuitive, robust behaviours and validate the method's efficiency and practicality for unsupervised intrinsic-motivation driven control, with potential for deployment on physical systems and integration with task rewards.

Abstract

We introduce a methodology for efficiently computing a lower bound to empowerment, allowing it to be used as an unsupervised cost function for policy learning in real-time control. Empowerment, being the channel capacity between actions and states, maximises the influence of an agent on its near future. It has been shown to be a good model of biological behaviour in the absence of an extrinsic goal. But empowerment is also prohibitively hard to compute, especially in nonlinear continuous spaces. We introduce an efficient, amortised method for learning empowerment-maximising policies. We demonstrate that our algorithm can reliably handle continuous dynamical systems using system dynamics learned from raw data. The resulting policies consistently drive the agents into states where they can use their full potential.

Unsupervised Real-Time Control through Variational Empowerment

TL;DR

This work addresses unsupervised control in high-dimensional continuous dynamics by introducing a scalable variational bound on empowerment, the mutual information between actions and next states given the current state . It develops an amortised, model-based framework that jointly learns a policy, a planning distribution, and a variational posterior, and leverages differentiable dynamics learned via Deep Variational Bayes Filters (DVBF) when needed. The approach enables real-time control by evaluating the empowerment bound with fixed, differentiable computations and extends naturally to -step empowerment for longer-horizon planning. Empirical results across pendulum, ball-in-box, multi-ball, and biped balancing tasks demonstrate intuitive, robust behaviours and validate the method's efficiency and practicality for unsupervised intrinsic-motivation driven control, with potential for deployment on physical systems and integration with task rewards.

Abstract

We introduce a methodology for efficiently computing a lower bound to empowerment, allowing it to be used as an unsupervised cost function for policy learning in real-time control. Empowerment, being the channel capacity between actions and states, maximises the influence of an agent on its near future. It has been shown to be a good model of biological behaviour in the absence of an extrinsic goal. But empowerment is also prohibitively hard to compute, especially in nonlinear continuous spaces. We introduce an efficient, amortised method for learning empowerment-maximising policies. We demonstrate that our algorithm can reliably handle continuous dynamical systems using system dynamics learned from raw data. The resulting policies consistently drive the agents into states where they can use their full potential.

Paper Structure

This paper contains 26 sections, 1 equation, 5 figures.

Figures (5)

  • Figure 1: Results for the pendulum experiment.
  • Figure 2: Results for the single ball in a box experiment.
  • Figure 3: Results for the multiple balls in a box experiment.
  • Figure 4: Bipedal walker balancing.
  • Figure 5: Empowerment landscape for the 5-step empowerment function.