Table of Contents
Fetching ...

Learning to Infer Graphics Programs from Hand-Drawn Images

Kevin Ellis, Daniel Ritchie, Armando Solar-Lezama, Joshua B. Tenenbaum

TL;DR

This work tackles inferring high-level graphics programs from hand-drawn diagrams by factoring the problem into image-to-spec and spec-to-program steps. It combines a neural network that proposes drawing commands with Sequential Monte Carlo and a constraint-based DSL synthesizer, further accelerated by a learned bias-optimal search policy. The approach generalizes to noisy real drawings, supports correcting neural proposals, enables program-based similarity measurements, and enables extrapolation of repetitive patterns. Together, these results show a viable path toward automatically inducing human-readable programs that generate perceptual input, with practical implications for figure generation and interpretation.

Abstract

We introduce a model that learns to convert simple hand drawings into graphics programs written in a subset of \LaTeX. The model combines techniques from deep learning and program synthesis. We learn a convolutional neural network that proposes plausible drawing primitives that explain an image. These drawing primitives are like a trace of the set of primitive commands issued by a graphics program. We learn a model that uses program synthesis techniques to recover a graphics program from that trace. These programs have constructs like variable bindings, iterative loops, or simple kinds of conditionals. With a graphics program in hand, we can correct errors made by the deep network, measure similarity between drawings by use of similar high-level geometric structures, and extrapolate drawings. Taken together these results are a step towards agents that induce useful, human-readable programs from perceptual input.

Learning to Infer Graphics Programs from Hand-Drawn Images

TL;DR

This work tackles inferring high-level graphics programs from hand-drawn diagrams by factoring the problem into image-to-spec and spec-to-program steps. It combines a neural network that proposes drawing commands with Sequential Monte Carlo and a constraint-based DSL synthesizer, further accelerated by a learned bias-optimal search policy. The approach generalizes to noisy real drawings, supports correcting neural proposals, enables program-based similarity measurements, and enables extrapolation of repetitive patterns. Together, these results show a viable path toward automatically inducing human-readable programs that generate perceptual input, with practical implications for figure generation and interpretation.

Abstract

We introduce a model that learns to convert simple hand drawings into graphics programs written in a subset of \LaTeX. The model combines techniques from deep learning and program synthesis. We learn a convolutional neural network that proposes plausible drawing primitives that explain an image. These drawing primitives are like a trace of the set of primitive commands issued by a graphics program. We learn a model that uses program synthesis techniques to recover a graphics program from that trace. These programs have constructs like variable bindings, iterative loops, or simple kinds of conditionals. With a graphics program in hand, we can correct errors made by the deep network, measure similarity between drawings by use of similar high-level geometric structures, and extrapolate drawings. Taken together these results are a step towards agents that induce useful, human-readable programs from perceptual input.

Paper Structure

This paper contains 23 sections, 14 equations, 17 figures, 5 tables.

Figures (17)

  • Figure 1:
  • Figure 2:
  • Figure 4: Black arrows: Top--down generative model; Program$\to$Spec$\to$Image. Red arrows: Bottom--up inference procedure. Bold: Random variables (image/spec/program)
  • Figure 5: Neural architecture for inferring specs from images. Blue: network inputs. Black: network operations. Red: draws from a multinomial. Typewriter font: network outputs. Renders on a $16\times 16$ grid, shown in gray. STN: differentiable attention mechanism jaderberg2015spatial.
  • Figure 6: Parsing LaTeX output after training on diagrams with $\leq 12$ objects. Out-of-sample generalization: Model generalizes to scenes with many more objects ($\approx$ at ceiling when tested on twice as many objects as were in the training data). Neither SMC nor the neural network are sufficient on their own. # particles varies by model: we compare the models with equal runtime ($\approx 1$ sec/object). Average number of errors is (# incorrect drawing commands predicted by model)$+$(# correct commands that were not predicted by model).
  • ...and 12 more figures