Table of Contents
Fetching ...

SQL4NN: Validation and expressive querying of models as data

Mark Gerarts, Juno Steegmans, Jan Van den Bussche

TL;DR

SQL4NN reframes trained neural networks as intensional data that can be stored and queried inside a relational database, enabling validation, verification, and white-box analyses over both training data and learned models. The authors demonstrate that neural networks can be encoded as Node/Edge relations and evaluated via SQL views, while leveraging recursion to handle variable-depth architectures and exploiting the piecewise-linear nature of ReLU activations. They connect practical in-database evaluation and verification to theoretical results showing first-order logic over the reals with linear constraints can be simulated in SQL for fixed depths, and they showcase white-box tasks such as geometry reconstruction and pruning, including the computation of breakpoints with $-u.bias / w$. The work provides a proof-of-concept demo using DuckDB and PyTorch on MNIST-scale models, highlighting the potential for integrated, explainable model analytics within database systems.

Abstract

We consider machine learning models, learned from data, to be an important, intensional, kind of data in themselves. As such, various analysis tasks on models can be thought of as queries over this intensional data, often combined with extensional data such as data for training or validation. We demonstrate that relational database systems and SQL can actually be well suited for many such tasks.

SQL4NN: Validation and expressive querying of models as data

TL;DR

SQL4NN reframes trained neural networks as intensional data that can be stored and queried inside a relational database, enabling validation, verification, and white-box analyses over both training data and learned models. The authors demonstrate that neural networks can be encoded as Node/Edge relations and evaluated via SQL views, while leveraging recursion to handle variable-depth architectures and exploiting the piecewise-linear nature of ReLU activations. They connect practical in-database evaluation and verification to theoretical results showing first-order logic over the reals with linear constraints can be simulated in SQL for fixed depths, and they showcase white-box tasks such as geometry reconstruction and pruning, including the computation of breakpoints with . The work provides a proof-of-concept demo using DuckDB and PyTorch on MNIST-scale models, highlighting the potential for integrated, explainable model analytics within database systems.

Abstract

We consider machine learning models, learned from data, to be an important, intensional, kind of data in themselves. As such, various analysis tasks on models can be thought of as queries over this intensional data, often combined with extensional data such as data for training or validation. We demonstrate that relational database systems and SQL can actually be well suited for many such tasks.

Paper Structure

This paper contains 8 sections, 1 equation, 3 figures.

Figures (3)

  • Figure 1: (a)--(d): Scaling input vector length; number of input vectors to be evaluated; depth of the network; number of hidden neurons in every layer. (e) Evaluating a depth-5 network on multiple input vectors is faster with a recursive SQL view than with a fixed composition of 5 non-recursive views.
  • Figure 2: Breakpoints and slopes. The blue line is the represented function.
  • Figure 3: Saliency map for a $28\times 28$ image of a handwritten digit.

Theorems & Definitions (1)

  • Remark