An LLM-Based Approach for Insight Generation in Data Analysis
Alberto Sánchez Pérez, Alaa Boukhary, Paolo Papotti, Luis Castejón Lozano, Adam Elwood
TL;DR
The paper presents an LLM-driven framework for extracting insightful, actionable textual insights from multi-table databases. It combines a Hypothesis Generator to formulate high-level questions, a Query Agent to generate and validate SQL queries, and a Summarization module to verbalize results into concise insights, with iterative hallucination mitigation. Insights are evaluated via a hybrid human–LLM scheme, using an Elo-based ranking for insightfulness and a truth-value-based measure for correctness, demonstrating improvements over baselines on both private and public datasets. The approach emphasizes scalability, cost-efficiency, and robust assessment, with potential impact across business analytics, healthcare, and research domains.
Abstract
Generating insightful and actionable information from databases is critical in data analysis. This paper introduces a novel approach using Large Language Models (LLMs) to automatically generate textual insights. Given a multi-table database as input, our method leverages LLMs to produce concise, text-based insights that reflect interesting patterns in the tables. Our framework includes a Hypothesis Generator to formulate domain-relevant questions, a Query Agent to answer such questions by generating SQL queries against a database, and a Summarization module to verbalize the insights. The insights are evaluated for both correctness and subjective insightfulness using a hybrid model of human judgment and automated metrics. Experimental results on public and enterprise databases demonstrate that our approach generates more insightful insights than other approaches while maintaining correctness.
