Table of Contents
Fetching ...

First Experiments with PowerPlay

Rupesh Kumar Srivastava, Bas R. Steunebrink, Jürgen Schmidhuber

TL;DR

The paper investigates open-ended learning through PowerPlay, a framework that jointly searches for new self-invented tasks and modifications to a growing solver. Using SLIM NNs as general problem solvers, it demonstrates how self-delimiting programs yield continual invention, compression, and speed-ups, while naturally modularizing knowledge. Two experiments—pattern recognition and motor control with a fovea—illustrate developmental stages, generalization, and efficient reuse of learned components. The results support the claim that open-ended exploration can build hierarchically organized, reusable skill repertoires with potential for transferring learned capabilities to external tasks.

Abstract

Like a scientist or a playing child, PowerPlay not only learns new skills to solve given problems, but also invents new interesting problems by itself. By design, it continually comes up with the fastest to find, initially novel, but eventually solvable tasks. It also continually simplifies or compresses or speeds up solutions to previous tasks. Here we describe first experiments with PowerPlay. A self-delimiting recurrent neural network SLIM RNN is used as a general computational problem solving architecture. Its connection weights can encode arbitrary, self-delimiting, halting or non-halting programs affecting both environment (through effectors) and internal states encoding abstractions of event sequences. Our PowerPlay-driven SLIM RNN learns to become an increasingly general solver of self-invented problems, continually adding new problem solving procedures to its growing skill repertoire. Extending a recent conference paper, we identify interesting, emerging, developmental stages of our open-ended system. We also show how it automatically self-modularizes, frequently re-using code for previously invented skills, always trying to invent novel tasks that can be quickly validated because they do not require too many weight changes affecting too many previous tasks.

First Experiments with PowerPlay

TL;DR

The paper investigates open-ended learning through PowerPlay, a framework that jointly searches for new self-invented tasks and modifications to a growing solver. Using SLIM NNs as general problem solvers, it demonstrates how self-delimiting programs yield continual invention, compression, and speed-ups, while naturally modularizing knowledge. Two experiments—pattern recognition and motor control with a fovea—illustrate developmental stages, generalization, and efficient reuse of learned components. The results support the claim that open-ended exploration can build hierarchically organized, reusable skill repertoires with potential for transferring learned capabilities to external tasks.

Abstract

Like a scientist or a playing child, PowerPlay not only learns new skills to solve given problems, but also invents new interesting problems by itself. By design, it continually comes up with the fastest to find, initially novel, but eventually solvable tasks. It also continually simplifies or compresses or speeds up solutions to previous tasks. Here we describe first experiments with PowerPlay. A self-delimiting recurrent neural network SLIM RNN is used as a general computational problem solving architecture. Its connection weights can encode arbitrary, self-delimiting, halting or non-halting programs affecting both environment (through effectors) and internal states encoding abstractions of event sequences. Our PowerPlay-driven SLIM RNN learns to become an increasingly general solver of self-invented problems, continually adding new problem solving procedures to its growing skill repertoire. Extending a recent conference paper, we identify interesting, emerging, developmental stages of our open-ended system. We also show how it automatically self-modularizes, frequently re-using code for previously invented skills, always trying to invent novel tasks that can be quickly validated because they do not require too many weight changes affecting too many previous tasks.

Paper Structure

This paper contains 9 sections, 7 figures, 3 algorithms.

Figures (7)

  • Figure 1: Experiment 1. Right after initialization, before the first compressions, the decision boundary may be arbitrary and possibly non-linear. The drive to compress and simplify, however, first encourages linear separability (top row). As more associations are invented, it becomes harder and harder to learn new ones that break the previous solver's generalization ability, while maintaining a linear boundary. Eventually this causes the decision boundary to become non-linear (bottom row). The decision boundary becomes increasingly non-linear, as more and more associations are invented and learned.
  • Figure 2: SLIM RNN activation scheme. At various time steps, active/winning neurons and their outgoing connections are highlighted. At each step, at most one neuron per WITAS can become active and propagate activations through its outgoing connections.
  • Figure 3: (a) Fovea design. Pixel intensities over each square are averaged to produce a real valued input. The smallest squares in the center are of size $3 \times 3$. (b) The RNN controls the fovea movement over a static image, in our experiments this photo of the city of Lugano.
  • Figure 4: For the first five self-invented non-compression tasks, we plot the number of connection usages per task. In this run, solutions to 340 self-generated tasks were learned. 67 of them were non-compression tasks (marked by small black lines at the top); the rest resulted in successful compressions of the SLIM RNN's weight matrix. Over time, previously learned skills tend to require less and less computational resources, i.e., the SLIM RNN-based solver learns to speed up its solutions to previous self-invented tasks. Although some plot lines occasionally go up, this is compensated for by a decrease of connection usages for dozens of other tasks (not shown here to prevent clutter).
  • Figure 5: For only six selected tasks (to prevent clutter), we plot the number of interactions with the environment, over a run where 67 novel non-compression tasks were learned, besides numerous additional compression tasks ignored here. Here an interaction is a SLIM NN computation step that produces at least one non-zero output neuron activation. The total number of interactions cannot exceed the number of steps until the halt neuron is activated.
  • ...and 2 more figures