Support-Set Context Matters for Bongard Problems
Nikhil Raghuraman, Adam W. Harley, Leonidas Guibas
TL;DR
The paper shows that Bongard problems, which demand abstract concept induction from small positive/negative support sets, are solvable far more effectively when set-level context is leveraged. It introduces support-set standardization as a simple, parameter-free adaptation and also a Transformer-based approach (Prototype-Mimic and SVM-Mimic) to extract rules from supports, achieving new state-of-the-art results on Bongard-LOGO ($75.3\%$) and Bongard-HOI ($76.4\%$) and strong performance on Bongard-Classic. The findings indicate that context across multiple supports is a critical signal for visual abstract reasoning and that relatively lightweight techniques can substantially boost performance without shifting to larger backbones. The work highlights practical gains for few-shot reasoning tasks and informs future directions in combining context-aware adaptations with learned priors for symbolic-like visual tasks.
Abstract
Current machine learning methods struggle to solve Bongard problems, which are a type of IQ test that requires deriving an abstract "concept" from a set of positive and negative "support" images, and then classifying whether or not a new query image depicts the key concept. On Bongard-HOI, a benchmark for natural-image Bongard problems, most existing methods have reached at best 69% accuracy (where chance is 50%). Low accuracy is often attributed to neural nets' lack of ability to find human-like symbolic rules. In this work, we point out that many existing methods are forfeiting accuracy due to a much simpler problem: they do not adapt image features given information contained in the support set as a whole, and rely instead on information extracted from individual supports. This is a critical issue, because the "key concept" in a typical Bongard problem can often only be distinguished using multiple positives and multiple negatives. We explore simple methods to incorporate this context and show substantial gains over prior works, leading to new state-of-the-art accuracy on Bongard-LOGO (75.3%) and Bongard-HOI (76.4%) compared to methods with equivalent vision backbone architectures and strong performance on the original Bongard problem set (60.8%).
