TinySQL: A Progressive Text-to-SQL Dataset for Mechanistic Interpretability Research
Abir Harrasse, Philip Quirke, Clement Neo, Dhruv Nathawani, Luke Marks, Amir Abdullah
TL;DR
TinySQL introduces a synthetic, progressively complex text-to-SQL dataset and a multi-model testbed to probe mechanistic interpretability of neural SQL generation. By combining Edge Attribution Patching, Sparse Autoencoders, and LogitLens across BM1 (33M), BM2 (~500M), and BM3 (1B) models, the work reveals two-phase generation (intent formation first, grounding later) and identifies both sparse, robust circuits and distributed representations that challenge simple circuit extraction. The study demonstrates that interpretability signals align across EAP and SAE methods, though larger models exhibit more fragmented, layer-spread mechanisms, highlighting limits of current MI tools. Overall, TinySQL provides a rigorous, controllable platform for comparing MI techniques and advances understanding of how neural models learn structured query generation with potential implications for robust real-world database interfaces.
Abstract
Mechanistic interpretability research faces a gap between analyzing simple circuits in toy tasks and discovering features in large models. To bridge this gap, we propose text-to-SQL generation as an ideal task to study, as it combines the formal structure of toy tasks with real-world complexity. We introduce TinySQL, a synthetic dataset, progressing from basic to advanced SQL operations, and train models ranging from 33M to 1B parameters to establish a comprehensive testbed for interpretability. We apply multiple complementary interpretability techniques, including Edge Attribution Patching and Sparse Autoencoders, to identify minimal circuits and components supporting SQL generation. We compare circuits for different SQL subskills, evaluating their minimality, reliability, and identifiability. Finally, we conduct a layerwise logit lens analysis to reveal how models compose SQL queries across layers: from intent recognition to schema resolution to structured generation. Our work provides a robust framework for probing and comparing interpretability methods in a structured, progressively complex setting.
