Table of Contents
Fetching ...

How Do Analysts Understand and Verify AI-Assisted Data Analyses?

Ken Gu, Ruoxi Shang, Tim Althoff, Chenglong Wang, Steven M. Drucker

TL;DR

The paper addresses the challenge of validating AI-assisted data analyses, where LLMs translate natural language prompts into data operations. It introduces a design probe that surfaces natural language explanations, code, and interactive data artifacts, and reports a qualitative study with 22 professional analysts to reveal verification workflows. Key findings show analysts routinely alternate between procedure-oriented and data-oriented verification, using artifacts from both domains to support sensemaking and provenance. The work offers concrete recommendations for analysts and tool designers to improve verification, including clarifying AI assumptions, connecting data and procedure artifacts, and integrating AI guidance into verification workflows.

Abstract

Data analysis is challenging as it requires synthesizing domain knowledge, statistical expertise, and programming skills. Assistants powered by large language models (LLMs), such as ChatGPT, can assist analysts by translating natural language instructions into code. However, AI-assistant responses and analysis code can be misaligned with the analyst's intent or be seemingly correct but lead to incorrect conclusions. Therefore, validating AI assistance is crucial and challenging. Here, we explore how analysts understand and verify the correctness of AI-generated analyses. To observe analysts in diverse verification approaches, we develop a design probe equipped with natural language explanations, code, visualizations, and interactive data tables with common data operations. Through a qualitative user study (n=22) using this probe, we uncover common behaviors within verification workflows and how analysts' programming, analysis, and tool backgrounds reflect these behaviors. Additionally, we provide recommendations for analysts and highlight opportunities for designers to improve future AI-assistant experiences.

How Do Analysts Understand and Verify AI-Assisted Data Analyses?

TL;DR

The paper addresses the challenge of validating AI-assisted data analyses, where LLMs translate natural language prompts into data operations. It introduces a design probe that surfaces natural language explanations, code, and interactive data artifacts, and reports a qualitative study with 22 professional analysts to reveal verification workflows. Key findings show analysts routinely alternate between procedure-oriented and data-oriented verification, using artifacts from both domains to support sensemaking and provenance. The work offers concrete recommendations for analysts and tool designers to improve verification, including clarifying AI assumptions, connecting data and procedure artifacts, and integrating AI guidance into verification workflows.

Abstract

Data analysis is challenging as it requires synthesizing domain knowledge, statistical expertise, and programming skills. Assistants powered by large language models (LLMs), such as ChatGPT, can assist analysts by translating natural language instructions into code. However, AI-assistant responses and analysis code can be misaligned with the analyst's intent or be seemingly correct but lead to incorrect conclusions. Therefore, validating AI assistance is crucial and challenging. Here, we explore how analysts understand and verify the correctness of AI-generated analyses. To observe analysts in diverse verification approaches, we develop a design probe equipped with natural language explanations, code, visualizations, and interactive data tables with common data operations. Through a qualitative user study (n=22) using this probe, we uncover common behaviors within verification workflows and how analysts' programming, analysis, and tool backgrounds reflect these behaviors. Additionally, we provide recommendations for analysts and highlight opportunities for designers to improve future AI-assistant experiences.
Paper Structure (43 sections, 6 figures, 5 tables)

This paper contains 43 sections, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Analysts may now need to understand and verify AI-assisted analyses. In traditional analysis workflows, analysts specify and execute their data operations using tools such as computational notebooks (A) or spreadsheets (B). Engaged in these operations, analysts are familiar with the process and results of their work (e.g., reports, code, tables, and visualizations). However, with AI-assisted analysis, analysts can convey their intentions using natural language (e.g., "How many items purchased within the month of November were returned to the seller?"). The AI assistant handles the task of specifying and performing the data operations. This shift requires analysts, including those who may not be familiar with the underlying execution language the assistant uses, to understand and verify the process and results of the assistant (C). In this paper, we study the workflows analysts with varied backgrounds use to understand and verify AI-assisted data analyses.
  • Figure 2: Probe Interface. The analyst's prompt and the assistant's response are shown in the left panel. The AI's response includes its natural language explanation (A), the code and code comments involved in its calculations (B), and a description of any intermediate data (C). The original data table and intermediate data table(s) are accessible via buttons interleaved in the AI's response (D1 and E1) and when clicked point to their corresponding data pane in the right panel (D2 and E2). These panes can also be opened directly in the right panel. In each pane, analysts can view the raw data table in the Dataset tab with sort and filter functionality (F). Analysts can also view a visualization showing the distribution and basic descriptive statistics of each column in the Summary tab (G).
  • Figure 3: Participants often followed procedure-oriented behaviors.
  • Figure 4: We show examples of interesting end-to-end verification workflows and associated artifacts using our labels in Table \ref{['tab:labeldefs']}. The labels help get a sense of the overall workflow and capture relevant behavioral patterns. For example, in T5, we observed P4 starting out (Start) focusing only on the natural language explanation ( Procedure Only ) before noticing an issue ( Notice ) from the explanation. Their behavior then shifts slightly as they include data artifacts ( Procedure + Data ) before noticing a subsequent issue in the code ( Notice ). Finally, they check the result data ( Data Only ) to confirm an error in the AI's analysis ( Confirm ). Overall, we observed 52 verification workflows in our study (39 of which involved errors) with an average length of 4.40 (std=1.42) labels. Four of these had two Notice patterns occur, 30 had one Notice pattern occur and the rest had none.
  • Figure 5: Both data and procedure artifacts were used to support participant's primary behaviors. For each type of behavior illustrated in Fig. \ref{['fig:workflows']}, we tally the unique artifacts involved, counting each artifact once per occurrence in a behavior. This distribution shows that participants extensively used data artifacts to support procedure-oriented behaviors and used procedure artifacts to support data-oriented behaviors (Top). The intermediate data, original, data, code, and natural language explanation were all pivotal for analysts' to notice an issue (Bottom).
  • ...and 1 more figures