Synthesizing Programs for Images using Reinforced Adversarial Learning
Yaroslav Ganin, Tejas Kulkarni, Igor Babuschkin, S. M. Ali Eslami, Oriol Vinyals
TL;DR
The paper tackles inverse graphics by learning a policy that synthesizes controllable visual programs executed by a renderer to match real images, without supervision. It introduces SPIRAL, an adversarial RL framework using a Wasserstein discriminator and a distributed actor-learner architecture to train a non-differentiable generator. The results demonstrate unsupervised end-to-end inverse graphics across MNIST, Omniglot, CelebA, and MuJoCo scenes, producing interpretable stroke-based decompositions and scene descriptions, and showing superiority of discriminator-based rewards over simple L2 losses. This work suggests a scalable path for visual program synthesis and inverse simulation, with promising avenues like MCTS and joint image–action discriminators for richer feedback.
Abstract
Advances in deep generative networks have led to impressive results in recent years. Nevertheless, such models can often waste their capacity on the minutiae of datasets, presumably due to weak inductive biases in their decoders. This is where graphics engines may come in handy since they abstract away low-level details and represent images as high-level programs. Current methods that combine deep learning and renderers are limited by hand-crafted likelihood or distance functions, a need for large amounts of supervision, or difficulties in scaling their inference algorithms to richer datasets. To mitigate these issues, we present SPIRAL, an adversarially trained agent that generates a program which is executed by a graphics engine to interpret and sample images. The goal of this agent is to fool a discriminator network that distinguishes between real and rendered data, trained with a distributed reinforcement learning setup without any supervision. A surprising finding is that using the discriminator's output as a reward signal is the key to allow the agent to make meaningful progress at matching the desired output rendering. To the best of our knowledge, this is the first demonstration of an end-to-end, unsupervised and adversarial inverse graphics agent on challenging real world (MNIST, Omniglot, CelebA) and synthetic 3D datasets.
