Table of Contents
Fetching ...

Risks and Opportunities in Human-Machine Teaming in Operationalizing Machine Learning Target Variables

Mengtian Guo, David Gotz, Yue Wang

TL;DR

This work explores how human-machine teaming can support this process of operationalizing machine learning target variables by accelerating iterations while preserving human judgment, and highlights the opportunities and risks of human-machine teaming in operationalizing machine learning target variables.

Abstract

Predictive modeling has the potential to enhance human decision-making. However, many predictive models fail in practice due to problematic problem formulation in cases where the prediction target is an abstract concept or construct and practitioners need to define an appropriate target variable as a proxy to operationalize the construct of interest. The choice of an appropriate proxy target variable is rarely self-evident in practice, requiring both domain knowledge and iterative data modeling. This process is inherently collaborative, involving both domain experts and data scientists. In this work, we explore how human-machine teaming can support this process by accelerating iterations while preserving human judgment. We study the impact of two human-machine teaming strategies on proxy construction: 1) relevance-first: humans leading the process by selecting relevant proxies, and 2) performance-first: machines leading the process by recommending proxies based on predictive performance. Based on a controlled user study of a proxy construction task (N = 20), we show that the performance-first strategy facilitated faster iterations and decision-making, but also biased users towards well-performing proxies that are misaligned with the application goal. Our study highlights the opportunities and risks of human-machine teaming in operationalizing machine learning target variables, yielding insights for future research to explore the opportunities and mitigate the risks.

Risks and Opportunities in Human-Machine Teaming in Operationalizing Machine Learning Target Variables

TL;DR

This work explores how human-machine teaming can support this process of operationalizing machine learning target variables by accelerating iterations while preserving human judgment, and highlights the opportunities and risks of human-machine teaming in operationalizing machine learning target variables.

Abstract

Predictive modeling has the potential to enhance human decision-making. However, many predictive models fail in practice due to problematic problem formulation in cases where the prediction target is an abstract concept or construct and practitioners need to define an appropriate target variable as a proxy to operationalize the construct of interest. The choice of an appropriate proxy target variable is rarely self-evident in practice, requiring both domain knowledge and iterative data modeling. This process is inherently collaborative, involving both domain experts and data scientists. In this work, we explore how human-machine teaming can support this process by accelerating iterations while preserving human judgment. We study the impact of two human-machine teaming strategies on proxy construction: 1) relevance-first: humans leading the process by selecting relevant proxies, and 2) performance-first: machines leading the process by recommending proxies based on predictive performance. Based on a controlled user study of a proxy construction task (N = 20), we show that the performance-first strategy facilitated faster iterations and decision-making, but also biased users towards well-performing proxies that are misaligned with the application goal. Our study highlights the opportunities and risks of human-machine teaming in operationalizing machine learning target variables, yielding insights for future research to explore the opportunities and mitigate the risks.

Paper Structure

This paper contains 28 sections, 1 equation, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Illustration of two collaborative, iterative strategies for machine learning proxy target selection. Domain experts are humans with nuanced understanding of the task domain. Data scientists are humans or machines in charge of data processing, model creation, and evaluation. In (a) Relevance First strategy, the domain expert proposes the next proxy with high relevance but unknown performance, then the data scientist evaluates the performance of predicting the proxy target. In (b) Performance First strategy, the data scientist proposes the next proxy target with high performance but unknown relevance, then the domain expert evaluates the relevance of the proxy.
  • Figure 2: The relationships between key concepts in the proxy target selection problem. The function $g$ uses observed outcomes $\mathcal{U}$ to construct the proxy target $Y$, which is a surrogate of the unobserved target outcome $Y^*$ (Box 1). The machine learning model $f$ uses predictors $\mathcal{X}$ to predict the proxy target $Y$ (Box 2). The problem is to construct $Y$ that is both relevant to $Y^*$ and can be accurately predicted using $\mathcal{X}$.
  • Figure 3: Overview of system interface, including (a) Proxy Detail View containing details of the current proxy target, (b) Candidate Presentation which can take one of the two interface conditions shown in Figure \ref{['fig:system_condition_comp']}, (c) Candidate Detail View presenting the details of a selected candidate from view (b) and its comparison with the current proxy, and (d) Proxy History View showing the iterations on proxies. As the user clicks "Update" in view (c), the proxy's details in view (a) will be updated and new proxy candidates will be generated and presented in view (b).
  • Figure 4: Proxy candidates presentation in the Relevance First condition and the Performance First condition. (a) Relevance First: all observed outcomes and those associated with the current proxy are presented in a pre-defined order based on the variables' labels. (b) Performance First: candidate proxies are ranked based on the resulting model performance.
  • Figure 5: (a) Quality measurements of proxies generated by participants under Performance First and Relevance First conditions. (*) indicates statistically significant differences between the two conditions ($p<0.05$). There is a significant difference in the resulting model performance of the proxies generated under the two conditions (Performance First>Relevance First). There is a significant difference in factor recall, but no significant difference in variable precision, recall, and factor precision. (b) Mean ratings given by participants in the post-task questionnaire. Error bars show the standard error. There is a significant difference between Performance First and Relevance First conditions in Q3 Performance, Q4 Overall, Q5 Performance Difference, and Q7 Decision (see Table \ref{['tab:questionnaire']} for question details).
  • ...and 1 more figures