Table of Contents
Fetching ...

Towards a Standard for JSON Document Databases

Elena Botoeva, Julien Corman, Norman Townsend

TL;DR

This work provides a formal foundation for JSON document databases by modelling JSON data via $d$-values, defining ordered and unordered object interpretations, and presenting MQuery, a formal abstraction of MongoDB's aggregation framework. It shows how MQuery can express core relational algebra constructs and more complex operations such as joins, unions, and graph traversals, while enabling algebraic optimisations and a vendor-neutral standard fragment. The authors also contrast their formalism with MongoDB's semantics, proposing unified Boolean semantics, consistent path interpretation, and data-independent evaluation to support reliable reasoning and standardisation. The proposed framework aims to enable interoperability, rigorous optimisation, and a principled basis for industry-wide JSON query standards. Collectively, this work lays the groundwork for a formal, extensible, and optimisable standard for JSON document databases.

Abstract

In this technical report, we present a formalisation of the MongoDB aggregation framework. Our aim is to identify a fragment that could serve as the starting point for an industry-wide standard for querying JSON document databases. We provide a syntax and formal semantics for a set of selected operators, We show how this fragment relates to known relational query languages. We explain how our semantics differs from the current implementation of MongoDB, and justify our choices. We provide a set of algebraic transformations that can be used for query optimisation.

Towards a Standard for JSON Document Databases

TL;DR

This work provides a formal foundation for JSON document databases by modelling JSON data via -values, defining ordered and unordered object interpretations, and presenting MQuery, a formal abstraction of MongoDB's aggregation framework. It shows how MQuery can express core relational algebra constructs and more complex operations such as joins, unions, and graph traversals, while enabling algebraic optimisations and a vendor-neutral standard fragment. The authors also contrast their formalism with MongoDB's semantics, proposing unified Boolean semantics, consistent path interpretation, and data-independent evaluation to support reliable reasoning and standardisation. The proposed framework aims to enable interoperability, rigorous optimisation, and a principled basis for industry-wide JSON query standards. Collectively, this work lays the groundwork for a formal, extensible, and optimisable standard for JSON document databases.

Abstract

In this technical report, we present a formalisation of the MongoDB aggregation framework. Our aim is to identify a fragment that could serve as the starting point for an industry-wide standard for querying JSON document databases. We provide a syntax and formal semantics for a set of selected operators, We show how this fragment relates to known relational query languages. We explain how our semantics differs from the current implementation of MongoDB, and justify our choices. We provide a set of algebraic transformations that can be used for query optimisation.

Paper Structure

This paper contains 56 sections, 45 equations, 11 figures.

Figures (11)

  • Figure 1: A JSON document about the band Queen.
  • Figure 2: The d-value that corresponds to the JSON document of Figure \ref{['fig:json-document']}.
  • Figure 3: A JSON document about the band ABBA.
  • Figure 4: Collection ${\small\textup{songs}}^I$.
  • Figure 5: Nested relation bands about Queen and ABBA.
  • ...and 6 more figures

Theorems & Definitions (25)

  • Example 1
  • Definition 1: d-value
  • Example 2
  • Definition 2: Collection
  • Definition 3: Database instance
  • Definition 4
  • Definition 5: Path
  • Definition 6: Evaluation function
  • Definition 7: Unordered semantics
  • Example 3
  • ...and 15 more