Table of Contents
Fetching ...

Semantics and Multi-Query Optimization Algorithms for the Analyze Operator

Marios Iakovidis, Panos Vassiliadis

TL;DR

This paper introduces the ANALYZE operator, a novel cube querying intentional operator that provides a 360 view of data, and introduces formal query semantics for the operator and theoretically proves that the exact same result can be obtained by merging the facilitator cube queries into a smaller number of queries.

Abstract

In their hunt for highlights, i.e., interesting patterns in the data, data analysts have to issue groups of related queries and manually combine their results. To the extent that the analyst's goals are based on an intention on what to discover (e.g., contrast a query result to peer ones, verify a pattern to a broader range of data in the data space, etc), the integration of intentional query operators in analytical engines can enhance the efficiency of these analytical tasks. In this paper, we introduce, with well-defined semantics, the ANALYZE operator, a novel cube querying intentional operator that provides a 360 view of data. We define the semantics of an ANALYZE query as a tuple of five internal, facilitator cube queries, that (a) report on the specifics of a particular subset of the data space, which is part of the query specification, and to which we refer as the original query, (b) contrast the result with results from peer-subspaces, or sibling queries, and, (c) explore the data space in lower levels of granularity via drill-down queries. We introduce formal query semantics for the operator and we theoretically prove that we can obtain the exact same result by merging the facilitator cube queries into a smaller number of queries. This effectively introduces a multi-query optimization (MQO) strategy for executing an ANALYZE query. We propose three alternative algorithms, (a) a simple execution without optimizations (Min-MQO), (b) a total merging of all the facilitator queries to a single one (Max-MQO), and (c) an intermediate strategy, Mid-MQO, that merges only a subset of the facilitator queries. Our experimentation demonstrates that Mid-MQO achieves consistently strong performance across several contexts, Min-MQO always follows it, and Max-MQO excels for queries where the siblings are sizable and significantly overlap.

Semantics and Multi-Query Optimization Algorithms for the Analyze Operator

TL;DR

This paper introduces the ANALYZE operator, a novel cube querying intentional operator that provides a 360 view of data, and introduces formal query semantics for the operator and theoretically proves that the exact same result can be obtained by merging the facilitator cube queries into a smaller number of queries.

Abstract

In their hunt for highlights, i.e., interesting patterns in the data, data analysts have to issue groups of related queries and manually combine their results. To the extent that the analyst's goals are based on an intention on what to discover (e.g., contrast a query result to peer ones, verify a pattern to a broader range of data in the data space, etc), the integration of intentional query operators in analytical engines can enhance the efficiency of these analytical tasks. In this paper, we introduce, with well-defined semantics, the ANALYZE operator, a novel cube querying intentional operator that provides a 360 view of data. We define the semantics of an ANALYZE query as a tuple of five internal, facilitator cube queries, that (a) report on the specifics of a particular subset of the data space, which is part of the query specification, and to which we refer as the original query, (b) contrast the result with results from peer-subspaces, or sibling queries, and, (c) explore the data space in lower levels of granularity via drill-down queries. We introduce formal query semantics for the operator and we theoretically prove that we can obtain the exact same result by merging the facilitator cube queries into a smaller number of queries. This effectively introduces a multi-query optimization (MQO) strategy for executing an ANALYZE query. We propose three alternative algorithms, (a) a simple execution without optimizations (Min-MQO), (b) a total merging of all the facilitator queries to a single one (Max-MQO), and (c) an intermediate strategy, Mid-MQO, that merges only a subset of the facilitator queries. Our experimentation demonstrates that Mid-MQO achieves consistently strong performance across several contexts, Min-MQO always follows it, and Max-MQO excels for queries where the siblings are sizable and significantly overlap.
Paper Structure (37 sections, 2 theorems, 2 equations, 7 figures, 9 tables, 2 algorithms)

This paper contains 37 sections, 2 theorems, 2 equations, 7 figures, 9 tables, 2 algorithms.

Key Result

Theorem 5.1

Assume the following two queries: and The query $q^b$ is usable for computing, or simply, usable for query $q^n$, if the following conditions hold:

Figures (7)

  • Figure 1: Facilitator queries per algorithm
  • Figure 2: Selectivity Ratio effect on Total Querying Execution Time. Max-MQO did not complete its execution on the 90% selectivity ratio case.
  • Figure 3: Number of Atomic Filters Effect. Max-MQO did not complete its execution for 2 atomic filters.
  • Figure 4: Grouper Effect on Total Execution Time
  • Figure 5: Total Execution Time Breakdown
  • ...and 2 more figures

Theorems & Definitions (3)

  • Theorem 5.1: Cube Usability (DBLP:conf/dolap/Vassiliadis23PV22)
  • Theorem 5.2: Multi-Query Usability
  • proof