Learning to Infer Generative Template Programs for Visual Concepts
R. Kenny Jones, Siddhartha Chaudhuri, Daniel Ritchie
TL;DR
This work tackles the challenge of learning flexible, general-purpose visual concepts without domain-specific priors by introducing Template Programs, a neurosymbolic framework that encodes concepts as DSL-based partial programs with HOLEs. A trio of inference networks—TemplateNet, ExpansionNet, and ParamNet—together infer and instantiate Template Programs from groups of visual inputs, guided by a two-stage learning protocol of synthetic pretraining and bootstrapped finetuning with self-supervised objectives. The approach is validated across three visual domains (2D layouts, Omniglot, and 3D shapes), demonstrating improved performance over domain-general baselines and competitive results against domain-specific methods for few-shot generation and co-segmentation, while enabling unconditional concept generation. The study highlights the framework's robustness to out-of-distribution inputs, the importance of bootstrapped finetuning and HOLE-based expansions, and outlines avenues for extending relational expressivity and handling variable input group sizes, with implications for broad domain-general concept learning.
Abstract
People grasp flexible visual concepts from a few examples. We explore a neurosymbolic system that learns how to infer programs that capture visual concepts in a domain-general fashion. We introduce Template Programs: programmatic expressions from a domain-specific language that specify structural and parametric patterns common to an input concept. Our framework supports multiple concept-related tasks, including few-shot generation and co-segmentation through parsing. We develop a learning paradigm that allows us to train networks that infer Template Programs directly from visual datasets that contain concept groupings. We run experiments across multiple visual domains: 2D layouts, Omniglot characters, and 3D shapes. We find that our method outperforms task-specific alternatives, and performs competitively against domain-specific approaches for the limited domains where they exist.
