Database Research needs an Abstract Relational Query Language
Wolfgang Gatterbauer, Diandre Miguel Sabale
TL;DR
The paper argues that SQL-centric querying should evolve beyond surface syntax toward a semantics-first abstract relational framework (ARQL). It introduces Abstract Relational Calculus (ARC) as a strict generalization of TRC, implemented via three modalities (textual comprehension, Abstract Language Tree, and higraph diagrams) to separate relational intent from presentation. By treating modalities and conventions as orthogonal, ARC serves as a Rosetta Stone for relational querying, enabling cross-language comparison, modular reasoning, and NL2SQL workflows without sacrificing semantic fidelity. The work illustrates ARC with matrix multiplication and the count bug, discusses recursion, joins, null handling, and aggregates, and outlines concrete next steps including a SQL↔ARC translator and formal coverage results. Overall, the framework aims to support intent-based benchmarking and robust machine-human validation of queries in the era of ML-generated queries.
Abstract
For decades, SQL has been the default language for composing queries, but it is increasingly used as an artifact to be read and verified rather than authored. With Large Language Models (LLMs), queries are increasingly machine-generated, while humans read, validate, and debug them. This shift turns relational query languages into interfaces for back-and-forth communication about intent, which will lead to a rethinking of relational language design, and more broadly, relational interface design. We argue that this rethinking needs support from an Abstract Relational Query Language (ARQL): a semantics-first reference metalanguage that separates query intent from user-facing syntax and makes underlying relational patterns explicit and comparable across user-facing languages. An ARQL separates a query into (i) a relational core (the compositional structure that determines intent), (ii) modalities (alternative representations of that core tailored to different audiences), and (iii) conventions (orthogonal environment-level semantic parameters under which the core is interpreted, e.g., set vs. bag semantics, or treatment of null values). Usability for humans or machines then depends less on choosing a particular language and more on choosing an appropriate modality. Comparing languages becomes a question of which relational patterns they support and what conventions they choose. We introduce Abstract Relational Calculus (ARC), a strict generalization of Tuple Relational Calculus (TRC), as a concrete instance of ARQL. ARC comes in three modalities: (i) a comprehension-style textual notation, (ii) an Abstract Language Tree (ALT) for machine reasoning about meaning, and (iii) a diagrammatic hierarchical graph (higraph) representation for humans. ARC provides the missing vocabulary and acts as a Rosetta Stone for relational querying.
