Table of Contents
Fetching ...

Database Views as Explanations for Relational Deep Learning

Agapi Rissaki, Ilias Fountalis, Wolfgang Gatterbauer, Benny Kimelfeld

TL;DR

This work addresses the opacity of relational deep learning models by introducing a framework where explanations are SQL-style view definitions over the database, grounded in a soft-determinacy notion that tolerates realistic perturbations. It offers a model-agnostic approach and a GNN-specific instantiation using learnable masks to identify concise, influential database components (columns, joins, and selections) that explain predictions. Empirical evaluation on RelBench demonstrates high-quality explanations with favorable runtime, and case studies show practical diagnostic capabilities such as detecting data leakage and identifying structural signals. The framework thus enables interpretable, database-centric insights for powerful relational predictive models with broad applicability to real-world datasets and tasks.

Abstract

In recent years, there has been significant progress in the development of deep learning models over relational databases, including architectures based on heterogeneous graph neural networks (hetero-GNNs) and heterogeneous graph transformers. In effect, such architectures state how the database records and links (e.g., foreign-key references) translate into a large, complex numerical expression, involving numerous learnable parameters. This complexity makes it hard to explain, in human-understandable terms, how a model uses the available data to arrive at a given prediction. We present a novel framework for explaining machine-learning models over relational databases, where explanations are view definitions that highlight focused parts of the database that mostly contribute to the model's prediction. We establish such global abductive explanations by adapting the classic notion of determinacy by Nash, Segoufin, and Vianu (2010). In addition to tuning the tradeoff between determinacy and conciseness, the framework allows controlling the level of granularity by adopting different fragments of view definitions, such as ones highlighting whole columns, foreign keys between tables, relevant groups of tuples, and so on. We investigate the realization of the framework in the case of hetero-GNNs, and develop a model-specific approach via the notion of learnable masks. For comparison, we propose model-agnostic heuristic baselines and show that our approach is both more efficient and achieves better explanation quality in most cases. Our extensive empirical evaluation on the RelBench collection across diverse domains and record-level tasks demonstrates both the usefulness of our explanations and the efficiency of their generation.

Database Views as Explanations for Relational Deep Learning

TL;DR

This work addresses the opacity of relational deep learning models by introducing a framework where explanations are SQL-style view definitions over the database, grounded in a soft-determinacy notion that tolerates realistic perturbations. It offers a model-agnostic approach and a GNN-specific instantiation using learnable masks to identify concise, influential database components (columns, joins, and selections) that explain predictions. Empirical evaluation on RelBench demonstrates high-quality explanations with favorable runtime, and case studies show practical diagnostic capabilities such as detecting data leakage and identifying structural signals. The framework thus enables interpretable, database-centric insights for powerful relational predictive models with broad applicability to real-world datasets and tasks.

Abstract

In recent years, there has been significant progress in the development of deep learning models over relational databases, including architectures based on heterogeneous graph neural networks (hetero-GNNs) and heterogeneous graph transformers. In effect, such architectures state how the database records and links (e.g., foreign-key references) translate into a large, complex numerical expression, involving numerous learnable parameters. This complexity makes it hard to explain, in human-understandable terms, how a model uses the available data to arrive at a given prediction. We present a novel framework for explaining machine-learning models over relational databases, where explanations are view definitions that highlight focused parts of the database that mostly contribute to the model's prediction. We establish such global abductive explanations by adapting the classic notion of determinacy by Nash, Segoufin, and Vianu (2010). In addition to tuning the tradeoff between determinacy and conciseness, the framework allows controlling the level of granularity by adopting different fragments of view definitions, such as ones highlighting whole columns, foreign keys between tables, relevant groups of tuples, and so on. We investigate the realization of the framework in the case of hetero-GNNs, and develop a model-specific approach via the notion of learnable masks. For comparison, we propose model-agnostic heuristic baselines and show that our approach is both more efficient and achieves better explanation quality in most cases. Our extensive empirical evaluation on the RelBench collection across diverse domains and record-level tasks demonstrates both the usefulness of our explanations and the efficiency of their generation.

Paper Structure

This paper contains 27 sections, 18 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: An explanation view defines a subdatabase (or more generally, a derived view). The rest of the database gets randomly perturbed. As long as the view is kept the same, a trained black-box model gives similar predictions over the perturbed database as the original database. We say that the view soft determines the prediction.
  • Figure 2: Example permutations for $\textsc{Projection}$ (upper) and $\textsc{Selection}$ (lower). For $\textsc{Projection}$, only attributes $A$ and $B$ (in orange) are in $\mathsf{Attr}(E)$ of explanation $E$. For $\textsc{Selection}$, the tuples (in orange) $\{(1,0,1,2), (2,1,2,4)\}$ are in $\mathsf{Tups}(E)$. The key attribute $A$ is always retained.
  • Figure 3: \ref{['example:FKjoin']}: foreign key perturbations for $\textsc{FKJoin}$. Here, the foreign key $T_1.B$ is perturbed as it is not in $\mathsf{FK}(E)$.
  • Figure 4: RDL model pipeline with hetero-GNNs. The figure shows where different masks are located in the pipeline.
  • Figure 5: Evaluation for $\proj$. For visualization purposes, tasks are separated into "easy" (left) and "hard" (right) ones, with separate $\devi_\Delta$ scales. The average (AVG) across all tasks is also shown. Below the dataset and task name, we indicate the explanation size $k^*$ and the total number of data attributes.
  • ...and 2 more figures

Theorems & Definitions (7)

  • Example 1
  • Example 2
  • Definition 1: Deviation from Determinacy
  • Definition 2: Soft determinacy explanations
  • Example 3
  • Example 4
  • Example 5