Table of Contents
Fetching ...

JSON Schema Inclusion through Refutational Normalization: Reconciling Efficiency and Completeness

Mohamed-Amine Baazizi, Nour El Houda Ben Ali, Dario Colazzo, Giorgio Ghelli, Stefan Klessinger, Carlo Sartiani, Stefanie Scherzinger

Abstract

JSON Schema is the de facto standard for describing the structure of JSON documents. Reasoning about JSON Schema inclusion - whether every instance satisfying a schema S1 also satisfies a schema S2 -is a key building block for a variety of tasks, including version and API compatibility checks, schema refactoring tools, and large-scale schema corpus analysis. Existing approaches fall into two families: rule-based algorithms that are efficient but incomplete and witness generation-based algorithms that are complete but oftentimes extremely slow. This paper introduces a new approach that reconciles the efficiency of rule-based procedures with the completeness of the witness-generation technique, by enriching the latter with a specialized form of normalization. This refutational normalization paves the way for use-cases that are too hard for current tools. Our experiments with real-world and synthetic schemas show that the refutational normalization greatly advances the state-of-the-art in JSON Schema inclusion checking.

JSON Schema Inclusion through Refutational Normalization: Reconciling Efficiency and Completeness

Abstract

JSON Schema is the de facto standard for describing the structure of JSON documents. Reasoning about JSON Schema inclusion - whether every instance satisfying a schema S1 also satisfies a schema S2 -is a key building block for a variety of tasks, including version and API compatibility checks, schema refactoring tools, and large-scale schema corpus analysis. Existing approaches fall into two families: rule-based algorithms that are efficient but incomplete and witness generation-based algorithms that are complete but oftentimes extremely slow. This paper introduces a new approach that reconciles the efficiency of rule-based procedures with the completeness of the witness-generation technique, by enriching the latter with a specialized form of normalization. This refutational normalization paves the way for use-cases that are too hard for current tools. Our experiments with real-world and synthetic schemas show that the refutational normalization greatly advances the state-of-the-art in JSON Schema inclusion checking.

Paper Structure

This paper contains 32 sections, 28 equations, 4 figures, 1 table, 5 algorithms.

Figures (4)

  • Figure 1: Performance comparison of rule-based (RB), witness-generation (WG), and refutational normalization approaches (RWG). Checking inclusion of schemas containing $\hbox{\tt{\small oneOf}}$ and schemas where $\hbox{\tt{\small oneOf}}$ is replaced by $\hbox{\tt{\small anyOf}}$.
  • Figure 2: Grammar of canonical conjunctions.
  • Figure 3: Failure rates of tools across datasets (lower is better). Tools distinguished by color, failure types by pattern. "$\mathsf{x}$" indicates zero-height bars (i.e., tools without failures)
  • Figure 4: Runtime comparison on intersection of successfully processed inclusion tests. #T indicates the number of inclusion tests in the intersection, #$\subseteq$/#$\not\subseteq$ shows the number of included/non-included tests. The markers on the vertical bars indicate the $5^{th}$, $50^{th}$, $95^{th}$ duration percentiles.

Theorems & Definitions (1)

  • Example 1