Table of Contents
Fetching ...

Database Research needs an Abstract Relational Query Language

Wolfgang Gatterbauer, Diandre Miguel Sabale

TL;DR

The paper argues that SQL-centric querying should evolve beyond surface syntax toward a semantics-first abstract relational framework (ARQL). It introduces Abstract Relational Calculus (ARC) as a strict generalization of TRC, implemented via three modalities (textual comprehension, Abstract Language Tree, and higraph diagrams) to separate relational intent from presentation. By treating modalities and conventions as orthogonal, ARC serves as a Rosetta Stone for relational querying, enabling cross-language comparison, modular reasoning, and NL2SQL workflows without sacrificing semantic fidelity. The work illustrates ARC with matrix multiplication and the count bug, discusses recursion, joins, null handling, and aggregates, and outlines concrete next steps including a SQL↔ARC translator and formal coverage results. Overall, the framework aims to support intent-based benchmarking and robust machine-human validation of queries in the era of ML-generated queries.

Abstract

For decades, SQL has been the default language for composing queries, but it is increasingly used as an artifact to be read and verified rather than authored. With Large Language Models (LLMs), queries are increasingly machine-generated, while humans read, validate, and debug them. This shift turns relational query languages into interfaces for back-and-forth communication about intent, which will lead to a rethinking of relational language design, and more broadly, relational interface design. We argue that this rethinking needs support from an Abstract Relational Query Language (ARQL): a semantics-first reference metalanguage that separates query intent from user-facing syntax and makes underlying relational patterns explicit and comparable across user-facing languages. An ARQL separates a query into (i) a relational core (the compositional structure that determines intent), (ii) modalities (alternative representations of that core tailored to different audiences), and (iii) conventions (orthogonal environment-level semantic parameters under which the core is interpreted, e.g., set vs. bag semantics, or treatment of null values). Usability for humans or machines then depends less on choosing a particular language and more on choosing an appropriate modality. Comparing languages becomes a question of which relational patterns they support and what conventions they choose. We introduce Abstract Relational Calculus (ARC), a strict generalization of Tuple Relational Calculus (TRC), as a concrete instance of ARQL. ARC comes in three modalities: (i) a comprehension-style textual notation, (ii) an Abstract Language Tree (ALT) for machine reasoning about meaning, and (iii) a diagrammatic hierarchical graph (higraph) representation for humans. ARC provides the missing vocabulary and acts as a Rosetta Stone for relational querying.

Database Research needs an Abstract Relational Query Language

TL;DR

The paper argues that SQL-centric querying should evolve beyond surface syntax toward a semantics-first abstract relational framework (ARQL). It introduces Abstract Relational Calculus (ARC) as a strict generalization of TRC, implemented via three modalities (textual comprehension, Abstract Language Tree, and higraph diagrams) to separate relational intent from presentation. By treating modalities and conventions as orthogonal, ARC serves as a Rosetta Stone for relational querying, enabling cross-language comparison, modular reasoning, and NL2SQL workflows without sacrificing semantic fidelity. The work illustrates ARC with matrix multiplication and the count bug, discusses recursion, joins, null handling, and aggregates, and outlines concrete next steps including a SQL↔ARC translator and formal coverage results. Overall, the framework aims to support intent-based benchmarking and robust machine-human validation of queries in the era of ML-generated queries.

Abstract

For decades, SQL has been the default language for composing queries, but it is increasingly used as an artifact to be read and verified rather than authored. With Large Language Models (LLMs), queries are increasingly machine-generated, while humans read, validate, and debug them. This shift turns relational query languages into interfaces for back-and-forth communication about intent, which will lead to a rethinking of relational language design, and more broadly, relational interface design. We argue that this rethinking needs support from an Abstract Relational Query Language (ARQL): a semantics-first reference metalanguage that separates query intent from user-facing syntax and makes underlying relational patterns explicit and comparable across user-facing languages. An ARQL separates a query into (i) a relational core (the compositional structure that determines intent), (ii) modalities (alternative representations of that core tailored to different audiences), and (iii) conventions (orthogonal environment-level semantic parameters under which the core is interpreted, e.g., set vs. bag semantics, or treatment of null values). Usability for humans or machines then depends less on choosing a particular language and more on choosing an appropriate modality. Comparing languages becomes a question of which relational patterns they support and what conventions they choose. We introduce Abstract Relational Calculus (ARC), a strict generalization of Tuple Relational Calculus (TRC), as a concrete instance of ARQL. ARC comes in three modalities: (i) a comprehension-style textual notation, (ii) an Abstract Language Tree (ALT) for machine reasoning about meaning, and (iii) a diagrammatic hierarchical graph (higraph) representation for humans. ARC provides the missing vocabulary and acts as a Rosetta Stone for relational querying.

Paper Structure

This paper contains 23 sections, 45 equations, 21 figures.

Figures (21)

  • Figure 1: An Abstract Relational QL (ARQL) abstracts away from syntactic details of a query to a higher-level representation. Just as Intermediate Representations (IRs) enable query optimization, a more abstract representation can support semantic understanding of a query's intent. Both humans and machines can benefit from modality [Error: Link "modalities" does not exist][Error: Link "modality" does not exist] tailored to their needs. convention [Error: Link "Conventions" does not exist][Error: Link "convention" does not exist] (not shown) factor out orthogonal design choices that don't affect the relational pattern.
  • Figure 2: (a): Linked ALT [Error: Link "Abstract Language Tree (ALT)" does not exist][Error: Link "ALT" does not exist] for TRC\ref{['eq:simpleTRC']}. The overlaid arrows show the result of the linking step and are conceptual only. (b): Diagrammatic higraph representation of the linked ALT as a variant of Relational Diagrams.
  • Figure 3: (a): Nested ARC from \ref{['eq: lateral join gtrc']} expressed as lateral join in SQL.
  • Figure 4: The semantics of a simple grouped aggregate query in a "FIO [Error: Link "from the inside out" does not exist][Error: Link "FIO" does not exist]" pattern represented in SQL (a) and ARC (b), \ref{['ALT:simple grouped by aggregate']}. Red overlay arrows indicate scoping, binding and grouping.
  • Figure 5: The semantics of a simple grouped aggregate query in a "FOI [Error: Link "from the outside in" does not exist][Error: Link "FOI" does not exist]" pattern represented in SQL with a scalar subquery (a) or lateral join (b), and ARC (c), \ref{['ARC: FOI comprehension']}. This relational pattern corresponds to the way Klug DBLP:journals/jacm/Klug82\ref{['Klug: simple']}, Hella et al. DBLP:journals/jacm/HellaLNW01\ref{['eq:libkin formalism:easy']}, and Soufflé souffleDBLP:conf/cc/ScholzJSW16\ref{['souffle:head aggregate']} express the query.
  • ...and 16 more figures

Theorems & Definitions (2)

  • Example 1: arithmetic and comparison operators
  • Example 2: unique-set query