Table of Contents
Fetching ...

On-Demand JSON: A Better Way to Parse Documents?

John Keiser, Daniel Lemire

TL;DR

JSON parsing often incurs memory and performance overhead when a full in-memory DOM is constructed. The On-Demand front-end offers a DOM-like API backed by a lightweight pointer-based iterator that lazily materializes values as they are accessed, enabling selective parsing with minimal memory. Empirical results show substantial speedups over traditional parsers and competitive performance with DOM-based simdjson, along with significantly fewer instructions per input byte. The approach preserves ergonomic data access while enabling high-throughput parsing and invites future extension to other languages, formats, and heterogeneous hardware.

Abstract

JSON is a popular standard for data interchange on the Internet. Ingesting JSON documents can be a performance bottleneck. A popular parsing strategy consists in converting the input text into a tree-based data structure -- sometimes called a Document Object Model or DOM. We designed and implemented a novel JSON parsing interface -- called On-Demand -- that appears to the programmer like a conventional DOM-based approach. However, the underlying implementation is a pointer iterating through the content, only materializing the results (objects, arrays, strings, numbers) lazily.On recent commodity processors, an implementation of our approach provides superior performance in multiple benchmarks. To ensure reproducibility, our work is freely available as open source software. Several systems use On-Demand: e.g., Apache Doris, the Node.js JavaScript runtime, Milvus, and Velox.

On-Demand JSON: A Better Way to Parse Documents?

TL;DR

JSON parsing often incurs memory and performance overhead when a full in-memory DOM is constructed. The On-Demand front-end offers a DOM-like API backed by a lightweight pointer-based iterator that lazily materializes values as they are accessed, enabling selective parsing with minimal memory. Empirical results show substantial speedups over traditional parsers and competitive performance with DOM-based simdjson, along with significantly fewer instructions per input byte. The approach preserves ergonomic data access while enabling high-throughput parsing and invites future extension to other languages, formats, and heterogeneous hardware.

Abstract

JSON is a popular standard for data interchange on the Internet. Ingesting JSON documents can be a performance bottleneck. A popular parsing strategy consists in converting the input text into a tree-based data structure -- sometimes called a Document Object Model or DOM. We designed and implemented a novel JSON parsing interface -- called On-Demand -- that appears to the programmer like a conventional DOM-based approach. However, the underlying implementation is a pointer iterating through the content, only materializing the results (objects, arrays, strings, numbers) lazily.On recent commodity processors, an implementation of our approach provides superior performance in multiple benchmarks. To ensure reproducibility, our work is freely available as open source software. Several systems use On-Demand: e.g., Apache Doris, the Node.js JavaScript runtime, Milvus, and Velox.
Paper Structure (5 sections, 5 figures, 4 tables)

This paper contains 5 sections, 5 figures, 4 tables.

Figures (5)

  • Figure 1: JSON example with corresponding tree form
  • Figure 2: Example of a streaming approach to parse coordinate data in JSON using JSON for Modern C++
  • Figure 3: On-Demand C++ source code used by the kostya JSON benchmark
  • Figure 4: On-Demand C++ example: it illustrates how we access key-value elements, handle integers and strings.
  • Figure 5: Speed and number of instructions for various benchmarks.