Table of Contents
Fetching ...

F-IVM: Analytics over Relational Databases under Updates

Ahmet Kara, Milos Nikolic, Dan Olteanu, Haozhe Zhang

TL;DR

F-IVM delivers a unified framework for maintaining a wide range of analytics over evolving relational data by combining higher-order incremental view maintenance, factorized computation, and a ring-based algebra. It introduces a data and query model over semirings, a variable-order-driven view-tree architecture, and a factoring strategy that enables efficient incremental maintenance for both classic and ML-centric analytics, including covariance-based linear regression, Chow-Liu trees, and matrix chain multiplication. The system achieves orders-of-magnitude improvements in runtime and memory relative to state-of-the-art IVM approaches, while supporting complex tasks through a cohesive ring abstraction and factorized representations. These results suggest substantial practical impact for real-time in-database analytics and provide a foundation for extending to broader ML and graphical-model workloads.

Abstract

This article describes F-IVM, a unified approach for maintaining analytics over changing relational data. We exemplify its versatility in four disciplines: processing queries with group-by aggregates and joins; learning linear regression models using the covariance matrix of the input features; building Chow-Liu trees using pairwise mutual information of the input features; and matrix chain multiplication. F-IVM has three main ingredients: higher-order incremental view maintenance; factorized computation; and ring abstraction. F-IVM reduces the maintenance of a task to that of a hierarchy of simple views. Such views are functions mapping keys, which are tuples of input values, to payloads, which are elements from a ring. F-IVM also supports efficient factorized computation over keys, payloads, and updates. Finally, F-IVM treats uniformly seemingly disparate tasks. In the key space, all tasks require joins and variable marginalization. In the payload space, tasks differ in the definition of the sum and product ring operations. We implemented F-IVM on top of DBToaster and show that it can outperform classical first-order and fully recursive higher-order incremental view maintenance by orders of magnitude while using less memory.

F-IVM: Analytics over Relational Databases under Updates

TL;DR

F-IVM delivers a unified framework for maintaining a wide range of analytics over evolving relational data by combining higher-order incremental view maintenance, factorized computation, and a ring-based algebra. It introduces a data and query model over semirings, a variable-order-driven view-tree architecture, and a factoring strategy that enables efficient incremental maintenance for both classic and ML-centric analytics, including covariance-based linear regression, Chow-Liu trees, and matrix chain multiplication. The system achieves orders-of-magnitude improvements in runtime and memory relative to state-of-the-art IVM approaches, while supporting complex tasks through a cohesive ring abstraction and factorized representations. These results suggest substantial practical impact for real-time in-database analytics and provide a foundation for extending to broader ML and graphical-model workloads.

Abstract

This article describes F-IVM, a unified approach for maintaining analytics over changing relational data. We exemplify its versatility in four disciplines: processing queries with group-by aggregates and joins; learning linear regression models using the covariance matrix of the input features; building Chow-Liu trees using pairwise mutual information of the input features; and matrix chain multiplication. F-IVM has three main ingredients: higher-order incremental view maintenance; factorized computation; and ring abstraction. F-IVM reduces the maintenance of a task to that of a hierarchy of simple views. Such views are functions mapping keys, which are tuples of input values, to payloads, which are elements from a ring. F-IVM also supports efficient factorized computation over keys, payloads, and updates. Finally, F-IVM treats uniformly seemingly disparate tasks. In the key space, all tasks require joins and variable marginalization. In the payload space, tasks differ in the definition of the sum and product ring operations. We implemented F-IVM on top of DBToaster and show that it can outperform classical first-order and fully recursive higher-order incremental view maintenance by orders of magnitude while using less memory.
Paper Structure (35 sections, 3 theorems, 29 equations, 28 figures)

This paper contains 35 sections, 3 theorems, 29 equations, 28 figures.

Key Result

Theorem 11

Let a query $\mathsf{Q}\xspace$ and a database of size $N$. F-IVM can maintain $\mathsf{Q}\xspace$ with $O(N)$ preprocessing, $O(1)$ enumeration delay, and $O(N)$ single-tuple update in case $\mathsf{Q}\xspace$ is free-connex acyclic. F-IVM can maintain $\mathsf{Q}\xspace$ with $O(N)$ preprocessing,

Figures (28)

  • Figure 1: View tree for the query in Example \ref{['ex:sql_sum_aggregate_intro']}. The propagation paths for updates to $S$ (right red) and to $T$ (left blue).
  • Figure 2: Overview of the F-IVM system.
  • Figure 3: (left) Variable order $\omega$ of the natural join of the relations $\mathsf{R}\xspace$, $\mathsf{S}\xspace$, and $\mathsf{T}\xspace$; (middle) View tree over $\omega$ and $\mathcal{F} = \emptyset$; (right) View definitions.
  • Figure 4: Creating a view tree $\tau(\omega, \mathcal{F})$ for a variable order $\omega$ and a set of free variables $\mathcal{F}$.
  • Figure 5: (left) View tree over the variable order $\omega$ in Figure \ref{['fig:example_payloads']} and $\mathcal{F} = \{A,C\}$; (right) View definitions.
  • ...and 23 more figures

Theorems & Definitions (40)

  • Definition 1
  • Example 2
  • Example 3
  • Example 4
  • Example 5
  • Definition 6: adapted from Olteanu:FactBounds:2015:TODS
  • Example 7
  • Example 8
  • Example 9
  • Example 10
  • ...and 30 more