Causal Search for Skylines (CSS): Causally-Informed Selective Data De-Correlation

Pratanu Mandal; Abhinav Gorantla; K. Selçuk Candan; Maria Luisa Sapino

Causal Search for Skylines (CSS): Causally-Informed Selective Data De-Correlation

Pratanu Mandal, Abhinav Gorantla, K. Selçuk Candan, Maria Luisa Sapino

Abstract

Skyline queries are popular and effective tools in multi-criteria decision support as they extract interesting (pareto-optimal) points that help summarize the available data with respect to a given set of preference attributes. Unfortunately, the efficiency of the skyline algorithms depends heavily on the underlying data statistics. In this paper, we argue that the efficiency of the skyline algorithms could be significantly boosted if one could erase any attribute correlations that do not agree with the preference criteria, while preserving (or even boosting) correlations that agree with the user provided criteria. Therefore, we propose a causallyinformed selective de-correlation mechanism to enable skyline algorithms to better leverage the pruning opportunities provided by the positively-aligned data distributions, without having to suffer from the mis-alignments. In particular, we show that, given a causal graph that describes the underlying causal structure of the data, one can identify a subset of the attributes that can be used to selectively de-correlate the preference attributes. Importantly, the proposed causal search for skylines (CSS) approach is agnostic to the underlying candidate enumeration and pruning strategies and, therefore, can be leveraged to improve any popular skyline discovery algorithm. Experiments on multiple real and synthetic data sets and for different skyline discovery algorithms show that the proposed causally-informed selective de-correlation technique significantly reduces both the number of dominance checks as well as the overall time needed to locate skyline points.

Causal Search for Skylines (CSS): Causally-Informed Selective Data De-Correlation

Abstract

Paper Structure (62 sections, 61 equations, 27 figures, 3 tables, 2 algorithms)

This paper contains 62 sections, 61 equations, 27 figures, 3 tables, 2 algorithms.

Introduction
Challenge: Impact of the Data Distribution on Skyline Discovery Performance
Our Contributions: Causally-Informed Selective Data De-Correlation
Related Works
Preliminaries
Causal Graphs and Skylines
Conditioning and De-Correlation
Causal Search for Skylines (CSS) with Selective De-Correlation
Problem Formulation
Baseline Algorithm - Algorithm #0: Data Driven Conditioning Set Selection
Algorithm #1: Gain-based Negative Path Blocking
Algorithm #2: Leaky Negative Path Blocking
Issue #1: Leaky Blocking
Issue #2: Noisy Causal Information Passing
Leaky Negative Path Blocking
...and 47 more sections

Figures (27)

Figure 1: Running example: skylines for house hunting
Figure 2: Impact of the alignment between data distribution and preference criteria: in (a) the dominance region is determined by a few extreme points, whereas in (b) there are a large number of skyline points on the Pareto front
Figure 3: Impact of clustering on the (a) confounder Z: (b) within each cluster, the correlation between $X$ and $Y$ is close to zero, and (c) seeking the skyline among the merged Pareto fronts (orange dots) across clusters is much less expensive, as many tuples have already been pruned
Figure 4: Three basic causal structures
Figure 5: (a,b) The fork attribute $A$ imposes negative correlation on attributes $X$ and $Y$; (c,d) conditioning of the attribute $A$ creates multiple clusters of data points, each cluster lacking any negative correlation; in (e), we highlight two of these resulting clusters in red and green
...and 22 more figures

Theorems & Definitions (11)

Example 1: House hunting
Definition 1: Causal Graph
Example 2: Weakening or Strengthening of Relationships among Preference Attributes
Definition 2: Conditioning
Example 3: Conditioning to Weaken Negative Correlations among the Preference Attributes
Example 4: Leaky Conditioning/Blocking
Definition 3: Tuple dominance ($dom$)
Definition 4: Skyline
Example 5: Causal graph for house hunting
Definition 5: Causal Path
...and 1 more

Causal Search for Skylines (CSS): Causally-Informed Selective Data De-Correlation

Abstract

Causal Search for Skylines (CSS): Causally-Informed Selective Data De-Correlation

Authors

Abstract

Table of Contents

Figures (27)

Theorems & Definitions (11)