Table of Contents
Fetching ...

TinySQL: A Progressive Text-to-SQL Dataset for Mechanistic Interpretability Research

Abir Harrasse, Philip Quirke, Clement Neo, Dhruv Nathawani, Luke Marks, Amir Abdullah

TL;DR

TinySQL introduces a synthetic, progressively complex text-to-SQL dataset and a multi-model testbed to probe mechanistic interpretability of neural SQL generation. By combining Edge Attribution Patching, Sparse Autoencoders, and LogitLens across BM1 (33M), BM2 (~500M), and BM3 (1B) models, the work reveals two-phase generation (intent formation first, grounding later) and identifies both sparse, robust circuits and distributed representations that challenge simple circuit extraction. The study demonstrates that interpretability signals align across EAP and SAE methods, though larger models exhibit more fragmented, layer-spread mechanisms, highlighting limits of current MI tools. Overall, TinySQL provides a rigorous, controllable platform for comparing MI techniques and advances understanding of how neural models learn structured query generation with potential implications for robust real-world database interfaces.

Abstract

Mechanistic interpretability research faces a gap between analyzing simple circuits in toy tasks and discovering features in large models. To bridge this gap, we propose text-to-SQL generation as an ideal task to study, as it combines the formal structure of toy tasks with real-world complexity. We introduce TinySQL, a synthetic dataset, progressing from basic to advanced SQL operations, and train models ranging from 33M to 1B parameters to establish a comprehensive testbed for interpretability. We apply multiple complementary interpretability techniques, including Edge Attribution Patching and Sparse Autoencoders, to identify minimal circuits and components supporting SQL generation. We compare circuits for different SQL subskills, evaluating their minimality, reliability, and identifiability. Finally, we conduct a layerwise logit lens analysis to reveal how models compose SQL queries across layers: from intent recognition to schema resolution to structured generation. Our work provides a robust framework for probing and comparing interpretability methods in a structured, progressively complex setting.

TinySQL: A Progressive Text-to-SQL Dataset for Mechanistic Interpretability Research

TL;DR

TinySQL introduces a synthetic, progressively complex text-to-SQL dataset and a multi-model testbed to probe mechanistic interpretability of neural SQL generation. By combining Edge Attribution Patching, Sparse Autoencoders, and LogitLens across BM1 (33M), BM2 (~500M), and BM3 (1B) models, the work reveals two-phase generation (intent formation first, grounding later) and identifies both sparse, robust circuits and distributed representations that challenge simple circuit extraction. The study demonstrates that interpretability signals align across EAP and SAE methods, though larger models exhibit more fragmented, layer-spread mechanisms, highlighting limits of current MI tools. Overall, TinySQL provides a rigorous, controllable platform for comparing MI techniques and advances understanding of how neural models learn structured query generation with potential implications for robust real-world database interfaces.

Abstract

Mechanistic interpretability research faces a gap between analyzing simple circuits in toy tasks and discovering features in large models. To bridge this gap, we propose text-to-SQL generation as an ideal task to study, as it combines the formal structure of toy tasks with real-world complexity. We introduce TinySQL, a synthetic dataset, progressing from basic to advanced SQL operations, and train models ranging from 33M to 1B parameters to establish a comprehensive testbed for interpretability. We apply multiple complementary interpretability techniques, including Edge Attribution Patching and Sparse Autoencoders, to identify minimal circuits and components supporting SQL generation. We compare circuits for different SQL subskills, evaluating their minimality, reliability, and identifiability. Finally, we conduct a layerwise logit lens analysis to reveal how models compose SQL queries across layers: from intent recognition to schema resolution to structured generation. Our work provides a robust framework for probing and comparing interpretability methods in a structured, progressively complex setting.

Paper Structure

This paper contains 60 sections, 1 equation, 29 figures, 20 tables, 1 algorithm.

Figures (29)

  • Figure 1: (a) TinySQL is broken down into 9 subsets of varying complexities, across both SQL query and user query axes. (b) We train and release a comprehensive set of models on each dataset subset.(c) We apply MI techniques across various configurations to understand model behavior and compare results.
  • Figure 2: To extract and interpret text-to-SQL functionality, we use Edge Attribution Patching to identify key connections, Selective Node Retention to create a minimal working circuit, and SAE Feature Selection to interpret node functionality. We also use prompt corruption and activation patching to form hypotheses on how the model implements functions.
  • Figure 3: Emergence of SQL intent (via keywords) precedes table name resolution during generation in BM2-CS3. LogitLens reveals elevated logit probabilities for SQL keywords early, followed by table tokens at deeper layers.
  • Figure 4: Training and Validation loss curves for instruction tuning BM1 (TinyStories-Instruct-2Layers-33M) on CS1_Syn, CS2_Syn, CS3_Syn
  • Figure 5: Accuracy curves for instruction tuning BM1 (TinyStories-Instruct-2Layers-33M) on CS1_Syn, CS2_Syn, CS3_Syn
  • ...and 24 more figures