Optimal Program Synthesis via Abstract Interpretation
Stephen Mell, Steve Zdancewic, Osbert Bastani
TL;DR
The paper addresses optimal synthesis of neurosymbolic programs with real-valued constants by introducing a unified framework that couples A* search with an admissible heuristic derived from abstract interpretation. It expands the search space to generalized partial programs and uses interval domains to overapproximate both program semantics and the objective, enabling provable pruning of suboptimal branches. The authors instantiate the framework on two DSLs (NEAR and Quivr) and demonstrate superior scalability compared to SMT-based optima and BFS baselines, while maintaining optimality guarantees within a user-defined tolerance $\bepsilon$. This approach offers a principled, scalable path for synthesizing high-quality programs for trajectory labeling and related data-processing tasks. The work advances the state of neurosymbolic synthesis by providing a general, provably optimal framework that leverages abstract interpretation to guide search.
Abstract
We consider the problem of synthesizing programs with numerical constants that optimize a quantitative objective, such as accuracy, over a set of input-output examples. We propose a general framework for optimal synthesis of such programs in a given domain specific language (DSL), with provable optimality guarantees. Our framework enumerates programs in a general search graph, where nodes represent subsets of concrete programs. To improve scalability, it uses A* search in conjunction with a search heuristic based on abstract interpretation; intuitively, this heuristic establishes upper bounds on the value of subtrees in the search graph, enabling the synthesizer to identify and prune subtrees that are provably suboptimal. In addition, we propose a natural strategy for constructing abstract transformers for monotonic semantics, which is a common property for components in DSLs for data classification. Finally, we implement our approach in the context of two such existing DSLs, demonstrating that our algorithm is more scalable than existing optimal synthesizers.
