Blaze: Compiling JSON Schema for 10x Faster Validation
Juan Cruz Viotti, Michael J. Mior
TL;DR
Blaze tackles the latency of JSON Schema validation in high-throughput Web APIs by compiling schemas into an efficient instruction language and executing precompiled code at runtime. It introduces a permissive DSL (constraintlanguage) and a comprehensive compilation pipeline that maps independent and dependent keywords to optimized instructions, with aggressive optimizations such as semi-perfect hashing, unrolling, regex tuning, and instruction reordering. Empirical results show Blaze achieves perfect correctness on the 2020-12 dialect and delivers substantial performance gains, averaging around 10x faster than the next-best validators across diverse datasets, with some cases reaching orders of magnitude improvements. The work demonstrates practical impact by significantly reducing validation overhead, and it outlines clear paths for future enhancements including code-generation, deeper static analysis, and better error reporting.
Abstract
JSON Schemas provide useful guardrails for developers of Web APIs to guarantee that the semi-structured JSON input provided by clients matches a predefined structure. This is important both to ensure the correctness of the data received as input and also to avoid potential security issues from processing input that is not correctly validated. However, this validation process can be time-consuming and adds overhead to every request. Different keywords in the JSON Schema specification have complex interactions that may increase validation time. Since popular APIs may process thousands of requests per second and schemas change infrequently, we observe that we can resolve some of the complexity ahead of time in order to achieve faster validation. Our JSON Schema validator, Blaze, compiles complex schemas to an efficient representation in seconds to minutes, adding minimal overhead at build time. Blaze incorporates several unique optimizations to reduce the validation time by an average of approximately 10x compared existing validators on a variety of datasets. In some cases, Blaze achieves a reduction in validation time of multiple orders of magnitude compared to the next fastest validator. We also demonstrate that several popular validators produce incorrect results in some cases, while Blaze maintains strict adherence to the JSON Schema specification.
