Table of Contents
Fetching ...

Embodied Active Learning of Generative Sensor-Object Models

Allison Pinosky, Todd D. Murphey

TL;DR

This work presents a method for learning image features of an unknown number of novel objects by using active coverage with respect to latent uncertainties of the novel descriptions and applies ergodic stability and PAC-Bayes theory to extend statistical guarantees for VAEs to embodied agents.

Abstract

When a robot encounters a novel object, how should it respond$\unicode{x2014}$what data should it collect$\unicode{x2014}$so that it can find the object in the future? In this work, we present a method for learning image features of an unknown number of novel objects. To do this, we use active coverage with respect to latent uncertainties of the novel descriptions. We apply ergodic stability and PAC-Bayes theory to extend statistical guarantees for VAEs to embodied agents. We demonstrate the method in hardware with a robotic arm; the pipeline is also implemented in a simulated environment. Algorithms and simulation are available open source, see http://sites.google.com/u.northwestern.edu/embodied-learning-hardware .

Embodied Active Learning of Generative Sensor-Object Models

TL;DR

This work presents a method for learning image features of an unknown number of novel objects by using active coverage with respect to latent uncertainties of the novel descriptions and applies ergodic stability and PAC-Bayes theory to extend statistical guarantees for VAEs to embodied agents.

Abstract

When a robot encounters a novel object, how should it respondwhat data should it collectso that it can find the object in the future? In this work, we present a method for learning image features of an unknown number of novel objects. To do this, we use active coverage with respect to latent uncertainties of the novel descriptions. We apply ergodic stability and PAC-Bayes theory to extend statistical guarantees for VAEs to embodied agents. We demonstrate the method in hardware with a robotic arm; the pipeline is also implemented in a simulated environment. Algorithms and simulation are available open source, see http://sites.google.com/u.northwestern.edu/embodied-learning-hardware .

Paper Structure

This paper contains 22 sections, 5 theorems, 22 equations, 5 figures, 2 tables, 1 algorithm.

Key Result

lemma thmcounterlemma

For a VAE with parameters $\phi$ and $\theta$ and let $K_\phi,K_\theta \in \mathbb{R}$ be the Lipschitz norms of the encoder and decoder respectively. Then the variational distribution $q_\phi(z|y)$ satisfies Assumption asmpt:functions with $\mathcal{E} = \{f:\mathcal{Z} \rightarrow\mathbb{R} \text{

Figures (5)

  • Figure 1: Test Environment. Hardware experiments were performed on a robot arm with a webcam attached to the end-effector. The robot controls planar end-effector states $(x,y,\theta)$. The goal is for the robot to explore a workspace while simultaneously building a representation of all observed objects in real-time. It takes approximately $15$ minutes to run $3000$ exploration steps and $9000$ model updates. After learning, the model and learned objects can be used for future object identification tasks.
  • Figure 2: Active Learning Process Numerals 1-7 describe the steps completed during each active learning step. The layers below the robot arm show, (I) previously visited states, future planned states, and samples drawn from the reachable workspace (see Eq. \ref{['eq:time_average_statistics']}), (II) conditional entropy of the model over the workspace samples (see Eq. \ref{['eq:cond-ent']}), and (III) objects in the workspace. Model training occurs in parallel to data collection, so the robot trajectory, model, dataset, and conditional entropy all evolve continually.
  • Figure 3: Learning Metrics. The first four columns show active exploration for different learning environments with 5 different seeds. The last column shows data collected with a random walk. The top row shows the number of active units for each mini-batch update. Zero indicates latent space collapse, and the higher the number of active units, the more the latent space is being used by the decoder. The middle row shows the ergodic metric for each exploration step---representing how well the collected data matches the conditional entropy distribution. An ergodic measure of zero would mean the collected data exactly matches the target distribution. The bottom row shows a top-down view of each learning workspace. These plots show for all active seeds, the number of active units remained high and collected data quickly converged to the conditional entropy distribution. For the random walk, two seeds experienced latent space collapse and the data collected did not match the conditional entropy distribution.
  • Figure 4: Test Workspaces
  • Figure 5: Object Clustering. (Top Left) conditional entropy distribution, squares are object locations; (Top Right) clustered conditional entropy samples; colors match the objects in the left plot. (Bottom Rows) data collected for 3 objects.

Theorems & Definitions (9)

  • definition thmcounterdefinition: VAE Encoder and Decoder Networks
  • lemma thmcounterlemma: VAE Satisfies Assumption \ref{['asmpt:functions']}
  • theorem thmcountertheorem: VAE PAC-Bayes Bounds
  • definition thmcounterdefinition: CVAE Encoder and Decoder Networks
  • definition thmcounterdefinition: Wasserstein distance givens1984class
  • lemma thmcounterlemma: CVAE Satisfies Assumption \ref{['asmpt:functions']}
  • proof
  • theorem thmcountertheorem: CVAE PAC-Bayes Bounds
  • theorem thmcountertheorem: Birkhoff's Ergodic Theorem