An LLM-Based Approach for Insight Generation in Data Analysis

Alberto Sánchez Pérez; Alaa Boukhary; Paolo Papotti; Luis Castejón Lozano; Adam Elwood

An LLM-Based Approach for Insight Generation in Data Analysis

Alberto Sánchez Pérez, Alaa Boukhary, Paolo Papotti, Luis Castejón Lozano, Adam Elwood

TL;DR

The paper presents an LLM-driven framework for extracting insightful, actionable textual insights from multi-table databases. It combines a Hypothesis Generator to formulate high-level questions, a Query Agent to generate and validate SQL queries, and a Summarization module to verbalize results into concise insights, with iterative hallucination mitigation. Insights are evaluated via a hybrid human–LLM scheme, using an Elo-based ranking for insightfulness and a truth-value-based measure for correctness, demonstrating improvements over baselines on both private and public datasets. The approach emphasizes scalability, cost-efficiency, and robust assessment, with potential impact across business analytics, healthcare, and research domains.

Abstract

Generating insightful and actionable information from databases is critical in data analysis. This paper introduces a novel approach using Large Language Models (LLMs) to automatically generate textual insights. Given a multi-table database as input, our method leverages LLMs to produce concise, text-based insights that reflect interesting patterns in the tables. Our framework includes a Hypothesis Generator to formulate domain-relevant questions, a Query Agent to answer such questions by generating SQL queries against a database, and a Summarization module to verbalize the insights. The insights are evaluated for both correctness and subjective insightfulness using a hybrid model of human judgment and automated metrics. Experimental results on public and enterprise databases demonstrate that our approach generates more insightful insights than other approaches while maintaining correctness.

An LLM-Based Approach for Insight Generation in Data Analysis

TL;DR

Abstract

An LLM-Based Approach for Insight Generation in Data Analysis

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)