Table of Contents
Fetching ...

Optimizing Context-Enhanced Relational Joins

Viktor Sanca, Manos Chatzakis, Anastasia Ailamaki

TL;DR

Context-rich, multi-modal data challenge traditional relational DBMS. The paper introduces a context-enhanced relational join and a composable embedding operator to integrate vector embeddings into declarative queries, preserving relational algebra while enabling vector-based similarity processing. Through extended algebra, cost models, and tensor-based formulations, it demonstrates substantial performance gains from holistic optimization across logical and physical layers, including prefetching embeddings and batching vectors. The approach enables hybrid vector-relational analytics with practical implications for string and multi-modal data processing on modern hardware.

Abstract

Collecting data, extracting value, and combining insights from relational and context-rich multi-modal sources in data processing pipelines presents a challenge for traditional relational DBMS. While relational operators allow declarative and optimizable query specification, they are limited to data transformations unsuitable for capturing or analyzing context. On the other hand, representation learning models can map context-rich data into embeddings, allowing machine-automated context processing but requiring imperative data transformation integration with the analytical query. To bridge this dichotomy, we present a context-enhanced relational join and introduce an embedding operator composable with relational operators. This enables hybrid relational and context-rich vector data processing, with algebraic equivalences compatible with relational algebra and corresponding logical and physical optimizations. We investigate model-operator interaction with vector data processing and study the characteristics of the E-join operator. Using an example of string embeddings, we demonstrate enabling hybrid context-enhanced processing on relational join operators with vector embeddings. The importance of holistic optimization, from logical to physical, is demonstrated in an order of magnitude execution time improvement.

Optimizing Context-Enhanced Relational Joins

TL;DR

Context-rich, multi-modal data challenge traditional relational DBMS. The paper introduces a context-enhanced relational join and a composable embedding operator to integrate vector embeddings into declarative queries, preserving relational algebra while enabling vector-based similarity processing. Through extended algebra, cost models, and tensor-based formulations, it demonstrates substantial performance gains from holistic optimization across logical and physical layers, including prefetching embeddings and batching vectors. The approach enables hybrid vector-relational analytics with practical implications for string and multi-modal data processing on modern hardware.

Abstract

Collecting data, extracting value, and combining insights from relational and context-rich multi-modal sources in data processing pipelines presents a challenge for traditional relational DBMS. While relational operators allow declarative and optimizable query specification, they are limited to data transformations unsuitable for capturing or analyzing context. On the other hand, representation learning models can map context-rich data into embeddings, allowing machine-automated context processing but requiring imperative data transformation integration with the analytical query. To bridge this dichotomy, we present a context-enhanced relational join and introduce an embedding operator composable with relational operators. This enables hybrid relational and context-rich vector data processing, with algebraic equivalences compatible with relational algebra and corresponding logical and physical optimizations. We investigate model-operator interaction with vector data processing and study the characteristics of the E-join operator. Using an example of string embeddings, we demonstrate enabling hybrid context-enhanced processing on relational join operators with vector embeddings. The importance of holistic optimization, from logical to physical, is demonstrated in an order of magnitude execution time improvement.
Paper Structure (30 sections, 13 equations, 17 figures, 2 tables)

This paper contains 30 sections, 13 equations, 17 figures, 2 tables.

Figures (17)

  • Figure 1: Problem: Model-RDBMS data analysis requires user expertise, imperative tasks, and data movement specification.
  • Figure 2: Enabler: models embed context-rich data into common tensor representation, enabling automated processing.
  • Figure 3: Context-enhanced, model-relational analytics.
  • Figure 4: Goal: Hybrid vector-relational operations are declarative transformation primitives amenable to query optimization.
  • Figure 5: Hybrid vector-relational query example, and the join operator which is the focus of the optimizations in this paper.
  • ...and 12 more figures