Table of Contents
Fetching ...

OpenHEXAI: An Open-Source Framework for Human-Centered Evaluation of Explainable Machine Learning

Jiaqi Ma, Vivian Lai, Yiming Zhang, Chacha Chen, Paul Hamilton, Davor Ljubenkov, Himabindu Lakkaraju, Chenhao Tan

TL;DR

OpenHEAXI is the first large-scale infrastructural effort to facilitate human-centered benchmarks of XAI methods, which simplifies the design and implementation of user studies for XAI methods, thus allowing researchers and practitioners to focus on the scientific questions.

Abstract

Recently, there has been a surge of explainable AI (XAI) methods driven by the need for understanding machine learning model behaviors in high-stakes scenarios. However, properly evaluating the effectiveness of the XAI methods inevitably requires the involvement of human subjects, and conducting human-centered benchmarks is challenging in a number of ways: designing and implementing user studies is complex; numerous design choices in the design space of user study lead to problems of reproducibility; and running user studies can be challenging and even daunting for machine learning researchers. To address these challenges, this paper presents OpenHEXAI, an open-source framework for human-centered evaluation of XAI methods. OpenHEXAI features (1) a collection of diverse benchmark datasets, pre-trained models, and post hoc explanation methods; (2) an easy-to-use web application for user study; (3) comprehensive evaluation metrics for the effectiveness of post hoc explanation methods in the context of human-AI decision making tasks; (4) best practice recommendations of experiment documentation; and (5) convenient tools for power analysis and cost estimation. OpenHEAXI is the first large-scale infrastructural effort to facilitate human-centered benchmarks of XAI methods. It simplifies the design and implementation of user studies for XAI methods, thus allowing researchers and practitioners to focus on the scientific questions. Additionally, it enhances reproducibility through standardized designs. Based on OpenHEXAI, we further conduct a systematic benchmark of four state-of-the-art post hoc explanation methods and compare their impacts on human-AI decision making tasks in terms of accuracy, fairness, as well as users' trust and understanding of the machine learning model.

OpenHEXAI: An Open-Source Framework for Human-Centered Evaluation of Explainable Machine Learning

TL;DR

OpenHEAXI is the first large-scale infrastructural effort to facilitate human-centered benchmarks of XAI methods, which simplifies the design and implementation of user studies for XAI methods, thus allowing researchers and practitioners to focus on the scientific questions.

Abstract

Recently, there has been a surge of explainable AI (XAI) methods driven by the need for understanding machine learning model behaviors in high-stakes scenarios. However, properly evaluating the effectiveness of the XAI methods inevitably requires the involvement of human subjects, and conducting human-centered benchmarks is challenging in a number of ways: designing and implementing user studies is complex; numerous design choices in the design space of user study lead to problems of reproducibility; and running user studies can be challenging and even daunting for machine learning researchers. To address these challenges, this paper presents OpenHEXAI, an open-source framework for human-centered evaluation of XAI methods. OpenHEXAI features (1) a collection of diverse benchmark datasets, pre-trained models, and post hoc explanation methods; (2) an easy-to-use web application for user study; (3) comprehensive evaluation metrics for the effectiveness of post hoc explanation methods in the context of human-AI decision making tasks; (4) best practice recommendations of experiment documentation; and (5) convenient tools for power analysis and cost estimation. OpenHEAXI is the first large-scale infrastructural effort to facilitate human-centered benchmarks of XAI methods. It simplifies the design and implementation of user studies for XAI methods, thus allowing researchers and practitioners to focus on the scientific questions. Additionally, it enhances reproducibility through standardized designs. Based on OpenHEXAI, we further conduct a systematic benchmark of four state-of-the-art post hoc explanation methods and compare their impacts on human-AI decision making tasks in terms of accuracy, fairness, as well as users' trust and understanding of the machine learning model.
Paper Structure (36 sections, 3 figures, 8 tables)

This paper contains 36 sections, 3 figures, 8 tables.

Figures (3)

  • Figure 1: This figure illustrates the task page for the RCDV dataset and conditions with the predicted label and explanations. (1) shows a box including explanations for features that require more explanations. (2) shows the profile of a defendant. (3) shows the predicted label. (4) is a description of how the bar chart could be interpreted. Finally, (5) shows the bar chart that orders features by their absolute feature importance scores.
  • Figure 2: This figure illustrates the task page for the control data feature only condition (F) on the German Credit dataset. There are two main components on this page, the features explanations box and the profile table. The features explanation box has more information on features that might be difficult to understand based on a short description. The profile in the table shows the information on the respective profile the user is required to predict.
  • Figure 3: This figure illustrates the task page for the control data feature and model prediction condition (FP) on the RCDV dataset. In addition to the features explanations box and the profile table shown in Figure \ref{['fig:german_control']}, there is an additional AI prediction on top of the profile table.