SYNAPSE: SYmbolic Neural-Aided Preference Synthesis Engine
Sadanand Modak, Noah Patton, Isil Dillig, Joydeep Biswas
TL;DR
SYNAPSE addresses the challenge of learning individual human preferences from limited visual demonstrations and natural language by grounding concepts in a neuro-symbolic DSL and incrementally synthesizing executable programs. The method ground concepts via NL explanations, generate a program sketch with numeric holes, and fill parameters through demonstrations using a constrained optimization (Max-SMT) framework, enabling lifelong, sample-efficient learning. Empirical results across mobility and manipulation domains show substantial out-of-distribution generalization and strong multi-user alignment, outperforming neural baselines and existing neurosymbolic methods while requiring far fewer demonstrations. Limitations include dependence on the quality of neural modules and NL input, with future work suggested on probabilistic reasoning and recency weighting to handle noise and changing preferences.
Abstract
This paper addresses the problem of preference learning, which aims to align robot behaviors through learning user specific preferences (e.g. "good pull-over location") from visual demonstrations. Despite its similarity to learning factual concepts (e.g. "red door"), preference learning is a fundamentally harder problem due to its subjective nature and the paucity of person-specific training data. We address this problem using a novel framework called SYNAPSE, which is a neuro-symbolic approach designed to efficiently learn preferential concepts from limited data. SYNAPSE represents preferences as neuro-symbolic programs, facilitating inspection of individual parts for alignment, in a domain-specific language (DSL) that operates over images and leverages a novel combination of visual parsing, large language models, and program synthesis to learn programs representing individual preferences. We perform extensive evaluations on various preferential concepts as well as user case studies demonstrating its ability to align well with dissimilar user preferences. Our method significantly outperforms baselines, especially when it comes to out of distribution generalization. We show the importance of the design choices in the framework through multiple ablation studies. Code, additional results, and supplementary material can be found on the website: https://amrl.cs.utexas.edu/synapse
