Table of Contents
Fetching ...

Summarized Causal Explanations For Aggregate Views (Full version)

Brit Youngmann, Michael Cafarella, Amir Gilad, Sudeepa Roy

TL;DR

Using background knowledge captured in a causal DAG, CauSumX finds the most effective causal treatments for different groups in the view and experimentally shows that the system generates useful summarized causal explanations compared to prior work and scales well for large high-dimensional data.

Abstract

SQL queries with group-by and average are frequently used and plotted as bar charts in several data analysis applications. Understanding the reasons behind the results in such an aggregate view may be a highly non-trivial and time-consuming task, especially for large datasets with multiple attributes. Hence, generating automated explanations for aggregate views can allow users to gain better insights into the results while saving time in data analysis. When providing explanations for such views, it is paramount to ensure that they are succinct yet comprehensive, reveal different types of insights that hold for different aggregate answers in the view, and, most importantly, they reflect reality and arm users to make informed data-driven decisions, i.e., the explanations do not only consider correlations but are causal. In this paper, we present CauSumX, a framework for generating summarized causal explanations for the entire aggregate view. Using background knowledge captured in a causal DAG, CauSumX finds the most effective causal treatments for different groups in the view. We formally define the framework and the optimization problem, study its complexity, and devise an efficient algorithm using the Apriori algorithm, LP rounding, and several optimizations. We experimentally show that our system generates useful summarized causal explanations compared to prior work and scales well for large high-dimensional data

Summarized Causal Explanations For Aggregate Views (Full version)

TL;DR

Using background knowledge captured in a causal DAG, CauSumX finds the most effective causal treatments for different groups in the view and experimentally shows that the system generates useful summarized causal explanations compared to prior work and scales well for large high-dimensional data.

Abstract

SQL queries with group-by and average are frequently used and plotted as bar charts in several data analysis applications. Understanding the reasons behind the results in such an aggregate view may be a highly non-trivial and time-consuming task, especially for large datasets with multiple attributes. Hence, generating automated explanations for aggregate views can allow users to gain better insights into the results while saving time in data analysis. When providing explanations for such views, it is paramount to ensure that they are succinct yet comprehensive, reveal different types of insights that hold for different aggregate answers in the view, and, most importantly, they reflect reality and arm users to make informed data-driven decisions, i.e., the explanations do not only consider correlations but are causal. In this paper, we present CauSumX, a framework for generating summarized causal explanations for the entire aggregate view. Using background knowledge captured in a causal DAG, CauSumX finds the most effective causal treatments for different groups in the view. We formally define the framework and the optimization problem, study its complexity, and devise an efficient algorithm using the Apriori algorithm, LP rounding, and several optimizations. We experimentally show that our system generates useful summarized causal explanations compared to prior work and scales well for large high-dimensional data

Paper Structure

This paper contains 20 sections, 2 theorems, 8 equations, 23 figures, 4 tables, 2 algorithms.

Key Result

proposition 1

It is NP-hard to decide whether the Summarized Causal Explanations problem is feasible (i.e., has any solution satisfying the constraints) for a given $k$ and $\theta$.

Figures (23)

  • Figure 1: A visualization of the Stack Overflow query results.
  • Figure 2: Causal explanation summary by CauSumX.
  • Figure 3: Example causal DAG.
  • Figure 4: Partial treatment-patterns lattice (Example \ref{['ex:lattice']}).
  • Figure 5: ILP for optimization problem (line \ref{['l:step3']} in \ref{['algo:full_algo']}).
  • ...and 18 more figures

Theorems & Definitions (13)

  • Example 1.1
  • Example 1.2
  • Definition 4.1: Pattern
  • Definition 4.2: Explanation Pattern
  • Example 4.1
  • Definition 4.3: Explainability
  • Example 4.2
  • Definition 4.4: Coverage
  • Definition 4.5: Summarized Causal Explanations
  • Example 4.3
  • ...and 3 more