Table of Contents
Fetching ...

QUIS: Question-guided Insights Generation for Automated Exploratory Data Analysis

Abhijit Manatkar, Ashlesha Akella, Parthivi Gupta, Krishnasuri Narayanam

TL;DR

QUIS is introduced, a fully automated EDA system that operates in two stages: insight generation (ISGen) driven by question generation (QUGen), and the QUGen module generates questions in iterations, refining them from previous iterations to enhance coverage without human intervention or manually curated examples.

Abstract

Discovering meaningful insights from a large dataset, known as Exploratory Data Analysis (EDA), is a challenging task that requires thorough exploration and analysis of the data. Automated Data Exploration (ADE) systems use goal-oriented methods with Large Language Models and Reinforcement Learning towards full automation. However, these methods require human involvement to anticipate goals that may limit insight extraction, while fully automated systems demand significant computational resources and retraining for new datasets. We introduce QUIS, a fully automated EDA system that operates in two stages: insight generation (ISGen) driven by question generation (QUGen). The QUGen module generates questions in iterations, refining them from previous iterations to enhance coverage without human intervention or manually curated examples. The ISGen module analyzes data to produce multiple relevant insights in response to each question, requiring no prior training and enabling QUIS to adapt to new datasets.

QUIS: Question-guided Insights Generation for Automated Exploratory Data Analysis

TL;DR

QUIS is introduced, a fully automated EDA system that operates in two stages: insight generation (ISGen) driven by question generation (QUGen), and the QUGen module generates questions in iterations, refining them from previous iterations to enhance coverage without human intervention or manually curated examples.

Abstract

Discovering meaningful insights from a large dataset, known as Exploratory Data Analysis (EDA), is a challenging task that requires thorough exploration and analysis of the data. Automated Data Exploration (ADE) systems use goal-oriented methods with Large Language Models and Reinforcement Learning towards full automation. However, these methods require human involvement to anticipate goals that may limit insight extraction, while fully automated systems demand significant computational resources and retraining for new datasets. We introduce QUIS, a fully automated EDA system that operates in two stages: insight generation (ISGen) driven by question generation (QUGen). The QUGen module generates questions in iterations, refining them from previous iterations to enhance coverage without human intervention or manually curated examples. The ISGen module analyzes data to produce multiple relevant insights in response to each question, requiring no prior training and enabling QUIS to adapt to new datasets.

Paper Structure

This paper contains 23 sections, 1 equation, 7 figures, 2 tables, 1 algorithm.

Figures (7)

  • Figure 1: The Question Generation (QUGen) module of QUIS system generates questions refined over iterations using data semantics, while the Insight Generation (ISGen) module generates insights (bottom-right) using those questions via statistical analysis. Question is encapsulated inside the Insight Card.
  • Figure 2: Example Insight Card
  • Figure 3: Comparison of Average Human Evaluation Scores for QUIS and OnlyStats across 3 datasets.
  • Figure 4: Comparison of Insight score for QUIS and OnlyStats.
  • Figure 5: Total number of unique insight cards generated by QUIS under non-iterative (1 iteration) and iterative (up to 11 iterations).
  • ...and 2 more figures