Table of Contents
Fetching ...

Tempo: Helping Data Scientists and Domain Experts Collaboratively Specify Predictive Modeling Tasks

Venkatesh Sivaraman, Anika Vaishampayan, Xiaotong Li, Brian R Buck, Ziyong Ma, Richard D Boyce, Adam Perer

TL;DR

Tempo tackles misalignment between decision-makers and predictive models by enabling collaborative specification of temporal tasks through a readable yet precise temporal query language. It combines lightweight temporal aggregations, live query feedback, and interactive subgroup analysis to accelerate ideation, prototyping, and critique in early model development. Through three case studies in web browsing behavior, sepsis care, and home health readmission, Tempo shows how expert involvement can prune infeasible specifications and reveal promising directions, while also highlighting design opportunities and limitations. The work argues for treating problem specification as a distinct, collaborative data science task and outlines a practical, open-source framework to support it, with potential extensions to large-language-model-assisted tooling in the future.

Abstract

Temporal predictive models have the potential to improve decisions in health care, public services, and other domains, yet they often fail to effectively support decision-makers. Prior literature shows that many misalignments between model behavior and decision-makers' expectations stem from issues of model specification, namely how, when, and for whom predictions are made. However, model specifications for predictive tasks are highly technical and difficult for non-data-scientist stakeholders to interpret and critique. To address this challenge we developed Tempo, an interactive system that helps data scientists and domain experts collaboratively iterate on model specifications. Using Tempo's simple yet precise temporal query language, data scientists can quickly prototype specifications with greater transparency about pre-processing choices. Moreover, domain experts can assess performance within data subgroups to validate that models behave as expected. Through three case studies, we demonstrate how Tempo helps multidisciplinary teams quickly prune infeasible specifications and identify more promising directions to explore.

Tempo: Helping Data Scientists and Domain Experts Collaboratively Specify Predictive Modeling Tasks

TL;DR

Tempo tackles misalignment between decision-makers and predictive models by enabling collaborative specification of temporal tasks through a readable yet precise temporal query language. It combines lightweight temporal aggregations, live query feedback, and interactive subgroup analysis to accelerate ideation, prototyping, and critique in early model development. Through three case studies in web browsing behavior, sepsis care, and home health readmission, Tempo shows how expert involvement can prune infeasible specifications and reveal promising directions, while also highlighting design opportunities and limitations. The work argues for treating problem specification as a distinct, collaborative data science task and outlines a practical, open-source framework to support it, with potential extensions to large-language-model-assisted tooling in the future.

Abstract

Temporal predictive models have the potential to improve decisions in health care, public services, and other domains, yet they often fail to effectively support decision-makers. Prior literature shows that many misalignments between model behavior and decision-makers' expectations stem from issues of model specification, namely how, when, and for whom predictions are made. However, model specifications for predictive tasks are highly technical and difficult for non-data-scientist stakeholders to interpret and critique. To address this challenge we developed Tempo, an interactive system that helps data scientists and domain experts collaboratively iterate on model specifications. Using Tempo's simple yet precise temporal query language, data scientists can quickly prototype specifications with greater transparency about pre-processing choices. Moreover, domain experts can assess performance within data subgroups to validate that models behave as expected. Through three case studies, we demonstrate how Tempo helps multidisciplinary teams quickly prune infeasible specifications and identify more promising directions to explore.

Paper Structure

This paper contains 45 sections, 6 figures.

Figures (6)

  • Figure 1: Model specification and training interfaces in Tempo, shown on an example use case of predicting readmission to a hospital from open-source clinical data with varying cohorts and time horizons. Users can create multiple model prototypes, listed in the Models sidebar (A), and edit how their inputs and outputs are defined in the Specification Editor (B). Once a prototype is trained, the Metrics view displays possible mis-specification alerts (C) and performance metrics (D).
  • Figure 2: Subgroup discovery interface in Tempo. The Subgroups tab (A) allows data scientists and domain experts to automatically mine rule-based subsets of the data that have interesting characteristics. Users can edit and refine subgroup definitions to evaluate the impact of individual variables or try new combinations (B). Search criteria can be flexibly defined within and across models (C); here we show subgroups with positive labels for the 90-day readmission outcome specified in Fig. \ref{['fig:spec-and-metrics']}. Selecting a subgroup reveals the Distinguishing Features view, which lists additional variables that differentiate the given subgroup from the overall dataset over time. For instance, in the subgroup of patients with a history of depression and a prior admission between 2 weeks and 1 month ago, patients also tend to have many more past admissions and a history of post-traumatic stress disorder (PTSD).
  • Figure 3: Two aggregations that might be performed on patient trajectories and how they could be implemented in Tempo, SQL, and Pandas. The Tempo queries are considerably more succinct, and do not require the user to keep track of multiple joining tables as SQL and Pandas typically do for temporal aggregations. Also, the differences between the two Tempo examples are small and semantically meaningful because no code optimization is needed to handle large datasets. (These examples all assume data is structured in Tempo's input format, which would resemble a typical database structure for this type of query).
  • Figure 4: Examples of Query Result tiles (right) displayed when the user executes each of the queries (left).
  • Figure 5: Two proposed model specifications for predicting readmission in home health patients. In both cases, hospital discharge events are used as the timestep definition, input variables are gathered from the prior 180 days, and the target variable of readmission is defined over the following 30 days. Based on domain expert feedback on specification (A), we add a 3-day window in (B) to check for admission to home health or a skilled nursing facility (SNF).
  • ...and 1 more figures