Data-Semantics-Aware Recommendation of Diverse Pivot Tables

Whanhee Cho; Anna Fariha

Data-Semantics-Aware Recommendation of Diverse Pivot Tables

Whanhee Cho, Anna Fariha

TL;DR

The paper addresses automatic, diverse pivot-table recommendations for high-dimensional datasets by proposing SAGE, a data-semantics-aware system. It defines a joint utility model combining insightfulness (semantic and statistical signals) and interpretability, and introduces a semantics-based diversity measure to ensure non-redundant pivot tables. A greedy algorithm with aggressive pruning and an LLM-proxy enables scalable generation of a diverse, high-utility pivot-table set under a budget, demonstrated across four real datasets and via a user study. Results show SAGE outperforms baselines in utility and diversity while remaining scalable, adaptable to user feedback, and capable of highlighting nontrivial insights beyond standard spreadsheet recommendations. The work advances pivot-table summarization by integrating data semantics, interpretability, and diversity into a unified, efficient framework with practical implications for data exploration in spreadsheets and beyond.

Abstract

Data summarization is essential to discover insights from large datasets. In a spreadsheets, pivot tables offer a convenient way to summarize tabular data by computing aggregates over some attributes, grouped by others. However, identifying attribute combinations that will result in useful pivot tables remains a challenge, especially for high-dimensional datasets. We formalize the problem of automatically recommending insightful and interpretable pivot tables, eliminating the tedious manual process. A crucial aspect of recommending a set of pivot tables is to diversify them. Traditional works inadequately address the table-diversification problem, which leads us to consider the problem of pivot table diversification. We present SAGE, a data-semantics-aware system for recommending k-budgeted diverse pivot tables, overcoming the shortcomings of prior work for top-k recommendations that cause redundancy. SAGE ensures that each pivot table is insightful, interpretable, and adaptive to the user's actions and preferences, while also guaranteeing that the set of pivot tables are different from each other, offering a diverse recommendation. We make two key technical contributions: (1) a data-semantics-aware model to measure the utility of a single pivot table and the diversity of a set of pivot tables, and (2) a scalable greedy algorithm that can efficiently select a set of diverse pivot tables of high utility, by leveraging data semantics to significantly reduce the combinatorial search space. Our extensive experiments on three real-world datasets show that SAGE outperforms alternative approaches, and efficiently scales to accommodate high-dimensional datasets. Additionally, we present several case studies to highlight SAGE's qualitative effectiveness over commercial software and Large Language Models (LLMs).

Data-Semantics-Aware Recommendation of Diverse Pivot Tables

TL;DR

Abstract

Data-Semantics-Aware Recommendation of Diverse Pivot Tables

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (20)

Theorems & Definitions (17)