Table of Contents
Fetching ...

Data Science with LLMs and Interpretable Models

Sebastian Bordt, Ben Lengerich, Harsha Nori, Rich Caruana

TL;DR

This work tackles enabling transparent data science by fusing large language models with interpretable Generalized Additive Models. The method trains GAMs, textualizes their graphs as JSON, and uses chain-of-thought prompting to elicit explanations from LLMs, producing dataset- and model-level summaries, while supporting hypothesis generation. Experiments show GPT-4 can reliably perform baseline graph tasks, generate coherent qualitative descriptions, and detect anomalies, with grounded responses in many cases but still vulnerable to hallucinations under certain prompts. The authors provide an open-source LLM-GAM interface and discuss practical implications for domain experts, along with limitations and directions for future improvement in grounding and evaluation with more complex graphs.

Abstract

Recent years have seen important advances in the building of interpretable models, machine learning models that are designed to be easily understood by humans. In this work, we show that large language models (LLMs) are remarkably good at working with interpretable models, too. In particular, we show that LLMs can describe, interpret, and debug Generalized Additive Models (GAMs). Combining the flexibility of LLMs with the breadth of statistical patterns accurately described by GAMs enables dataset summarization, question answering, and model critique. LLMs can also improve the interaction between domain experts and interpretable models, and generate hypotheses about the underlying phenomenon. We release \url{https://github.com/interpretml/TalkToEBM} as an open-source LLM-GAM interface.

Data Science with LLMs and Interpretable Models

TL;DR

This work tackles enabling transparent data science by fusing large language models with interpretable Generalized Additive Models. The method trains GAMs, textualizes their graphs as JSON, and uses chain-of-thought prompting to elicit explanations from LLMs, producing dataset- and model-level summaries, while supporting hypothesis generation. Experiments show GPT-4 can reliably perform baseline graph tasks, generate coherent qualitative descriptions, and detect anomalies, with grounded responses in many cases but still vulnerable to hallucinations under certain prompts. The authors provide an open-source LLM-GAM interface and discuss practical implications for domain experts, along with limitations and directions for future improvement in grounding and evaluation with more complex graphs.

Abstract

Recent years have seen important advances in the building of interpretable models, machine learning models that are designed to be easily understood by humans. In this work, we show that large language models (LLMs) are remarkably good at working with interpretable models, too. In particular, we show that LLMs can describe, interpret, and debug Generalized Additive Models (GAMs). Combining the flexibility of LLMs with the breadth of statistical patterns accurately described by GAMs enables dataset summarization, question answering, and model critique. LLMs can also improve the interaction between domain experts and interpretable models, and generate hypotheses about the underlying phenomenon. We release \url{https://github.com/interpretml/TalkToEBM} as an open-source LLM-GAM interface.
Paper Structure (11 sections, 4 figures, 1 table)

This paper contains 11 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: Conceptual overview of our approach. (1) We first train an interpretable model. (2) Next, we convert the different components of the interpretable model to text. (3) Then we provide these components as inputs to an LLM. (4) The user asks the LLM questions about the model or the dataset. The LLM can provide model-level summaries, or answer specific questions about particular aspects of the model. The LLM can also be used to automatically generate hypotheses about the real-world phenomenon that underlies the model and data.
  • Figure 2: The basic building block of our framework is the ability of LLMs to describe and summarize the individual graphs of Generalized Additive Models (GAMs). Considering a GAM one graph at a time allows the LLM to work with interpretable models even on large-scale datasets with hundreds of features while staying within confined context windows.
  • Figure 3: (continues on the next page)
  • Figure 4: The prompt structure to describe graphs that is the basis for the results in this paper. In the depicted example, we ask the model to describe the Age graph of a GAM trained on the Titanic dataset from Kaggle. Note that we provide the LLM with a description of the dataset and the meaning of the values on the y-axis in the graphs. The prompt structure is fairly general and can be easily amended to better fit various specific use cases.