First Experiments with PowerPlay
Rupesh Kumar Srivastava, Bas R. Steunebrink, Jürgen Schmidhuber
TL;DR
The paper investigates open-ended learning through PowerPlay, a framework that jointly searches for new self-invented tasks and modifications to a growing solver. Using SLIM NNs as general problem solvers, it demonstrates how self-delimiting programs yield continual invention, compression, and speed-ups, while naturally modularizing knowledge. Two experiments—pattern recognition and motor control with a fovea—illustrate developmental stages, generalization, and efficient reuse of learned components. The results support the claim that open-ended exploration can build hierarchically organized, reusable skill repertoires with potential for transferring learned capabilities to external tasks.
Abstract
Like a scientist or a playing child, PowerPlay not only learns new skills to solve given problems, but also invents new interesting problems by itself. By design, it continually comes up with the fastest to find, initially novel, but eventually solvable tasks. It also continually simplifies or compresses or speeds up solutions to previous tasks. Here we describe first experiments with PowerPlay. A self-delimiting recurrent neural network SLIM RNN is used as a general computational problem solving architecture. Its connection weights can encode arbitrary, self-delimiting, halting or non-halting programs affecting both environment (through effectors) and internal states encoding abstractions of event sequences. Our PowerPlay-driven SLIM RNN learns to become an increasingly general solver of self-invented problems, continually adding new problem solving procedures to its growing skill repertoire. Extending a recent conference paper, we identify interesting, emerging, developmental stages of our open-ended system. We also show how it automatically self-modularizes, frequently re-using code for previously invented skills, always trying to invent novel tasks that can be quickly validated because they do not require too many weight changes affecting too many previous tasks.
