Table of Contents
Fetching ...

Optimizing Relational Queries over Array-Valued Data in Columnar Systems

Maroua Zeblah, Etienne Couritas, Sarah Chlyah, Pierre Genevès, Nils Gesbert, Nabil Layaïda

Abstract

Modern analytical workloads increasingly combine relational data with array-valued attributes. While columnar database systems efficiently process such workloads, their ability to optimize queries that interleave relational operators with array manipulations remains limited. This paper introduces A3D-RA, an extended relational algebra supporting array-valued attributes, together with a comprehensive framework for algebraic reasoning and optimization. We formalize its data model and semantics, develop a complete set of equivalence-preserving transformation rules capturing pairwise interactions between relational and array operators, and propose a plan enumeration strategy with an optimality guarantee that remains polynomial in all non-join operators. We design A3D-RA as a modular, backend-independent optimization layer that can be instantiated over existing analytical database systems. Experimental results across three high-performance engines on a real-world workload show consistent performance gains enabled by the proposed algebraic optimization layer.

Optimizing Relational Queries over Array-Valued Data in Columnar Systems

Abstract

Modern analytical workloads increasingly combine relational data with array-valued attributes. While columnar database systems efficiently process such workloads, their ability to optimize queries that interleave relational operators with array manipulations remains limited. This paper introduces A3D-RA, an extended relational algebra supporting array-valued attributes, together with a comprehensive framework for algebraic reasoning and optimization. We formalize its data model and semantics, develop a complete set of equivalence-preserving transformation rules capturing pairwise interactions between relational and array operators, and propose a plan enumeration strategy with an optimality guarantee that remains polynomial in all non-join operators. We design A3D-RA as a modular, backend-independent optimization layer that can be instantiated over existing analytical database systems. Experimental results across three high-performance engines on a real-world workload show consistent performance gains enabled by the proposed algebraic optimization layer.

Paper Structure

This paper contains 75 sections, 33 equations, 14 figures, 2 tables, 1 algorithm.

Figures (14)

  • Figure 1: Syntax of A3D-RA terms.
  • Figure 2: Semantics of A3D-RA.
  • Figure 3: A3D-Optimizer System Architecture.
  • Figure 4: ClickHouse and Umbra runtime: native vs. A3D-RA.
  • Figure 5: Snowflake runtime: native vs. A3D-RA.
  • ...and 9 more figures

Theorems & Definitions (5)

  • Definition 1: Tuple
  • Definition 2: Relation
  • Definition 3: Semantic Correspondence of Arrays
  • Definition 4: Invertible Predicate
  • Definition 5: Distributive aggregation