Table of Contents
Fetching ...

The State of the Art in Enhancing Trust in Machine Learning Models with the Use of Visualizations

A. Chatzimparmpas, R. Martins, I. Jusufi, K. Kucher, Fabrice Rossi, A. Kerren

TL;DR

This work addresses the challenge of establishing trust in ML systems by synthesizing visualizations that enhance understanding and trust across data, algorithms, and outcomes. It introduces a fine-grained, multi-level taxonomy (TL1–TL5) linking data, processing, learning methods, concrete models, and evaluation to visualization techniques, complemented by empirical analyses and a public TrustMLVis browser. Through topic modeling, correlation analyses, and data-set investigations across 200 papers, the STAR reveals trends, gaps, and opportunities for visualization to improve trust, including uncertainty awareness, fairness, and in-situ model comparisons. The work provides a practical roadmap for researchers and practitioners to design trust-enhancing visualizations and to prioritize underexplored areas, with the TrustMLVis browser enabling ongoing, community-driven exploration and extension.

Abstract

Machine learning (ML) models are nowadays used in complex applications in various domains, such as medicine, bioinformatics, and other sciences. Due to their black box nature, however, it may sometimes be hard to understand and trust the results they provide. This has increased the demand for reliable visualization tools related to enhancing trust in ML models, which has become a prominent topic of research in the visualization community over the past decades. To provide an overview and present the frontiers of current research on the topic, we present a State-of-the-Art Report (STAR) on enhancing trust in ML models with the use of interactive visualization. We define and describe the background of the topic, introduce a categorization for visualization techniques that aim to accomplish this goal, and discuss insights and opportunities for future research directions. Among our contributions is a categorization of trust against different facets of interactive ML, expanded and improved from previous research. Our results are investigated from different analytical perspectives: (a) providing a statistical overview, (b) summarizing key findings, (c) performing topic analyses, and (d) exploring the data sets used in the individual papers, all with the support of an interactive web-based survey browser. We intend this survey to be beneficial for visualization researchers whose interests involve making ML models more trustworthy, as well as researchers and practitioners from other disciplines in their search for effective visualization techniques suitable for solving their tasks with confidence and conveying meaning to their data.

The State of the Art in Enhancing Trust in Machine Learning Models with the Use of Visualizations

TL;DR

This work addresses the challenge of establishing trust in ML systems by synthesizing visualizations that enhance understanding and trust across data, algorithms, and outcomes. It introduces a fine-grained, multi-level taxonomy (TL1–TL5) linking data, processing, learning methods, concrete models, and evaluation to visualization techniques, complemented by empirical analyses and a public TrustMLVis browser. Through topic modeling, correlation analyses, and data-set investigations across 200 papers, the STAR reveals trends, gaps, and opportunities for visualization to improve trust, including uncertainty awareness, fairness, and in-situ model comparisons. The work provides a practical roadmap for researchers and practitioners to design trust-enhancing visualizations and to prioritize underexplored areas, with the TrustMLVis browser enabling ongoing, community-driven exploration and extension.

Abstract

Machine learning (ML) models are nowadays used in complex applications in various domains, such as medicine, bioinformatics, and other sciences. Due to their black box nature, however, it may sometimes be hard to understand and trust the results they provide. This has increased the demand for reliable visualization tools related to enhancing trust in ML models, which has become a prominent topic of research in the visualization community over the past decades. To provide an overview and present the frontiers of current research on the topic, we present a State-of-the-Art Report (STAR) on enhancing trust in ML models with the use of interactive visualization. We define and describe the background of the topic, introduce a categorization for visualization techniques that aim to accomplish this goal, and discuss insights and opportunities for future research directions. Among our contributions is a categorization of trust against different facets of interactive ML, expanded and improved from previous research. Our results are investigated from different analytical perspectives: (a) providing a statistical overview, (b) summarizing key findings, (c) performing topic analyses, and (d) exploring the data sets used in the individual papers, all with the support of an interactive web-based survey browser. We intend this survey to be beneficial for visualization researchers whose interests involve making ML models more trustworthy, as well as researchers and practitioners from other disciplines in their search for effective visualization techniques suitable for solving their tasks with confidence and conveying meaning to their data.
Paper Structure (47 sections, 8 figures, 5 tables)

This paper contains 47 sections, 8 figures, 5 tables.

Figures (8)

  • Figure 1: The overview of our STAR with regard to the methodology, main results, and corresponding sections of the manuscript. Color coding is used for grouping related activities and results (purple for the background information and key concepts, blue for the literature search, green for the paper categorization, orange for the data analyses, and yellow for the manuscript); italic font is used for intermediate activities; and bold font is used for the items discussed explicitly in this STAR. The marks Ⓢ1--Ⓢ8 refer to supplementary materials.
  • Figure 2: A typical ML pipeline (depicted in red), assisted by visualization (in purple). Issues of trust permeate the complete shown pipeline, and we locate and categorize these issues in several trust levels (TLs). The various categories proposed in this work are represented in green. The yellow "cloud" represents the knowledge created by the different target groups while they pursue their goals by using visualizations to explore the pipeline, the data and/or the ML models. Finally, at the very top, we encode the real-world applications with an ellipsoid.
  • Figure 3: Histogram of the set of collected techniques/tools (200 in total) with regard to the publication year. ($\ast$) Please note that the data for 2020 is incomplete since the data collection for this survey was completed in January 2020. ($\dagger$) For 2007, we did not perform a complete search; the single publication was found within the related work section of another already-included paper.
  • Figure 4: Co-authorship network visualization with the eight largest connected components (①--⑧) highlighted in different colors. The node size represents the in-degree centrality of each author. The labels are filtered based on the in-degree value in order to reduce clutter.
  • Figure 5: Visual exploration of new interesting topics derived from the 200 papers. (a) Papers' embedding generated with the t-SNE algorithm and based on the corresponding topics. The black outlines were manually drawn on top, and the tags act as short versions of the 10 full topic titles. (b) Bar chart of topics with each topic's significance (scaled from 0 to 1). (c) Horizontal bar chart of top terms with the highest relevance for all the topics. Here, topics are encoded with color and number (in parentheses); a single term can be found in several topics.
  • ...and 3 more figures