Table of Contents
Fetching ...

Learning to Edit Visual Programs with Self-Supervision

R. Kenny Jones, Renhao Zhang, Aditya Ganeshan, Daniel Ritchie

TL;DR

This work designs a system that learns how to edit visual programs and develops a self-supervised learning approach that integrates this edit network into a bootstrapped finetuning loop along with a network that predicts entire programs in one-shot.

Abstract

We design a system that learns how to edit visual programs. Our edit network consumes a complete input program and a visual target. From this input, we task our network with predicting a local edit operation that could be applied to the input program to improve its similarity to the target. In order to apply this scheme for domains that lack program annotations, we develop a self-supervised learning approach that integrates this edit network into a bootstrapped finetuning loop along with a network that predicts entire programs in one-shot. Our joint finetuning scheme, when coupled with an inference procedure that initializes a population from the one-shot model and evolves members of this population with the edit network, helps to infer more accurate visual programs. Over multiple domains, we experimentally compare our method against the alternative of using only the one-shot model, and find that even under equal search-time budgets, our editing-based paradigm provides significant advantages.

Learning to Edit Visual Programs with Self-Supervision

TL;DR

This work designs a system that learns how to edit visual programs and develops a self-supervised learning approach that integrates this edit network into a bootstrapped finetuning loop along with a network that predicts entire programs in one-shot.

Abstract

We design a system that learns how to edit visual programs. Our edit network consumes a complete input program and a visual target. From this input, we task our network with predicting a local edit operation that could be applied to the input program to improve its similarity to the target. In order to apply this scheme for domains that lack program annotations, we develop a self-supervised learning approach that integrates this edit network into a bootstrapped finetuning loop along with a network that predicts entire programs in one-shot. Our joint finetuning scheme, when coupled with an inference procedure that initializes a population from the one-shot model and evolves members of this population with the edit network, helps to infer more accurate visual programs. Over multiple domains, we experimentally compare our method against the alternative of using only the one-shot model, and find that even under equal search-time budgets, our editing-based paradigm provides significant advantages.
Paper Structure (37 sections, 3 equations, 7 figures, 4 tables, 1 algorithm)

This paper contains 37 sections, 3 equations, 7 figures, 4 tables, 1 algorithm.

Figures (7)

  • Figure 1: We design a network that learns how to locally edit an input program towards a target. It first predicts what type of edit operation should be applied, then it predicts where that edit operation should be applied, and finally it autoregressively samples any parameters the edit operation requires.
  • Figure 2: Network Training
  • Figure 3: Comparing reconstructions of one-shot models (top) against our joint approach (middle).
  • Figure 4: For 2D CSG, we compare reconstruction accuracy (Chamfer distance, lower is better, Y-axis) between using an edit network and using only a one-shot network while varying time spent on inference (left) and training set size (right).
  • Figure 5: Ablation study comparing our method against alternative formulations.
  • ...and 2 more figures