Neuro-Symbolic Scene Graph Conditioning for Synthetic Image Dataset Generation
Giacomo Savazzi, Eugenio Lomurno, Cristian Sbrolli, Agnese Chiatti, Matteo Matteucci
TL;DR
The paper addresses data scarcity for training complex visual reasoning models by integrating Neuro-Symbolic conditioning with scene graphs to guide synthetic image generation. It introduces SGAdapter-based conditioning within Stable Diffusion 2.0, presenting four configurations that fuse scene-graph structure with text prompts to produce semantically informed synthetic data. Empirical results show notable improvements in Recall metrics for dataset augmentation (+2.59% on average) while synthetic-only training benefits more from higher-fidelity diffusion baselines, highlighting a complementary relationship between structural conditioning and perceptual realism. This neuro-symbolic augmentation strategy offers a practical path to boosting SGG performance in data-constrained settings, with potential for extension to broader reasoning tasks and richer relational constraints.
Abstract
As machine learning models increase in scale and complexity, obtaining sufficient training data has become a critical bottleneck due to acquisition costs, privacy constraints, and data scarcity in specialised domains. While synthetic data generation has emerged as a promising alternative, a notable performance gap remains compared to models trained on real data, particularly as task complexity grows. Concurrently, Neuro-Symbolic methods, which combine neural networks' learning strengths with symbolic reasoning's structured representations, have demonstrated significant potential across various cognitive tasks. This paper explores the utility of Neuro-Symbolic conditioning for synthetic image dataset generation, focusing specifically on improving the performance of Scene Graph Generation models. The research investigates whether structured symbolic representations in the form of scene graphs can enhance synthetic data quality through explicit encoding of relational constraints. The results demonstrate that Neuro-Symbolic conditioning yields significant improvements of up to +2.59% in standard Recall metrics and +2.83% in No Graph Constraint Recall metrics when used for dataset augmentation. These findings establish that merging Neuro-Symbolic and generative approaches produces synthetic data with complementary structural information that enhances model performance when combined with real data, providing a novel approach to overcome data scarcity limitations even for complex visual reasoning tasks.
