Table of Contents
Fetching ...

More Than "Means to an End": Supporting Reasoning with Transparently Designed AI Data Science Processes

Venkatesh Sivaraman, Patrick Vossler, Adam Perer, Julian Hong, Jean Feng

Abstract

Generative artificial intelligence (AI) tools can now help people perform complex data science tasks regardless of their expertise. While these tools have great potential to help more people work with data, their end-to-end approach does not support users in evaluating alternative approaches and reformulating problems, both critical to solving open-ended tasks in high-stakes domains. In this paper, we reflect on two AI data science systems designed for the medical setting and how they function as tools for thought. We find that success in these systems was driven by constructing AI workflows around intentionally-designed intermediate artifacts, such as readable query languages, concept definitions, or input-output examples. Despite opaqueness in other parts of the AI process, these intermediates helped users reason about important analytical choices, refine their initial questions, and contribute their unique knowledge. We invite the HCI community to consider when and how intermediate artifacts should be designed to promote effective data science thinking.

More Than "Means to an End": Supporting Reasoning with Transparently Designed AI Data Science Processes

Abstract

Generative artificial intelligence (AI) tools can now help people perform complex data science tasks regardless of their expertise. While these tools have great potential to help more people work with data, their end-to-end approach does not support users in evaluating alternative approaches and reformulating problems, both critical to solving open-ended tasks in high-stakes domains. In this paper, we reflect on two AI data science systems designed for the medical setting and how they function as tools for thought. We find that success in these systems was driven by constructing AI workflows around intentionally-designed intermediate artifacts, such as readable query languages, concept definitions, or input-output examples. Despite opaqueness in other parts of the AI process, these intermediates helped users reason about important analytical choices, refine their initial questions, and contribute their unique knowledge. We invite the HCI community to consider when and how intermediate artifacts should be designed to promote effective data science thinking.

Paper Structure

This paper contains 8 sections, 3 figures.

Figures (3)

  • Figure 1: While current AI data science tools use complex, opaque processes that are difficult for users to verify and control (a), we propose that AI data science workflows be explicitly designed around intermediate artifacts that can foster data science reasoning and allow users to steer execution (b). How to choose the stages and intermediate artifacts to structure this transparent process is a central question for future research.
  • Figure 2: The HACHI workflow trains a predictive model using concepts identified from clinical notes, yielding intermediate artifacts such as the definitions of the discovered concepts, notes and their labels, and the model's performance.
  • Figure 3: In Tempo, the AI Assistant translates a user's query into a readable, precise query language to extract temporal data from electronic health records. The TempoQL query and its results serve as intermediate artifacts that help the analyst decide if the data extraction was successful.