Exploratory Visual Analysis for Increasing Data Readiness in Artificial Intelligence Projects

Mattias Tiger; Daniel Jakobsson; Anders Ynnerman; Fredrik Heintz; Daniel Jönsson

Exploratory Visual Analysis for Increasing Data Readiness in Artificial Intelligence Projects

Mattias Tiger, Daniel Jakobsson, Anders Ynnerman, Fredrik Heintz, Daniel Jönsson

TL;DR

This work addresses raising data readiness for AI through integrated visualization, extending the data readiness concept to time-varying and text data and formalizing a mapping from data-readiness questions to simple visual analyses. It introduces an extended A-B-C band framework with seven $A$-aspects to connect data, task, and solution considerations, and presents minimalist visualization guidelines that support data profiling and stakeholder communication. The approach is demonstrated via multi-year case studies, showing how visual analysis uncovers data issues, aids decisions on data collection, and informs model adaptation and deployment readiness. The results highlight practical benefits for data-centric AI workflows and point to future work on extending the guidelines to classification tasks and more integrated visualization environments.

Abstract

We present experiences and lessons learned from increasing data readiness of heterogeneous data for artificial intelligence projects using visual analysis methods. Increasing the data readiness level involves understanding both the data as well as the context in which it is used, which are challenges well suitable to visual analysis. For this purpose, we contribute a mapping between data readiness aspects and visual analysis techniques suitable for different data types. We use the defined mapping to increase data readiness levels in use cases involving time-varying data, including numerical, categorical, and text. In addition to the mapping, we extend the data readiness concept to better take aspects of the task and solution into account and explicitly address distribution shifts during data collection time. We report on our experiences in using the presented visual analysis techniques to aid future artificial intelligence projects in raising the data readiness level.

Exploratory Visual Analysis for Increasing Data Readiness in Artificial Intelligence Projects

TL;DR

-aspects to connect data, task, and solution considerations, and presents minimalist visualization guidelines that support data profiling and stakeholder communication. The approach is demonstrated via multi-year case studies, showing how visual analysis uncovers data issues, aids decisions on data collection, and informs model adaptation and deployment readiness. The results highlight practical benefits for data-centric AI workflows and point to future work on extending the guidelines to classification tasks and more integrated visualization environments.

Abstract

Paper Structure (14 sections, 10 figures)

This paper contains 14 sections, 10 figures.

Introduction
Related Work
Background
Methodology
Data Readiness -- Extended
Visualization for data readiness
Case studies
Band C
Text analysis
Distribution shift
Band A - Feature Perspective
Band A - Solution Perspective
Discussion
Conclusion

Figures (10)

Figure 1: Illustration of the overall workflow for model and data in AI projects that apply the data readiness concepts.
Figure 2: Adaptation of the three data readiness bands lawrence2017data aimed at providing a structured way of analyzing and communicating data quality. The right arrows illustrate the main process flow, while the left arrows stress that iterations might be necessary as new knowledge is acquired.
Figure 3: Overview of text that preserves its semantics can be obtained by projecting the output of late layers in language models into 2D space. Inspecting the text content of clusters and outliers can aid in detecting flawed text as well as text collection errors. Coloring by target variable can further reveal if the text is helpful for solving the task.
Figure 4: Visualize distributions to detect flaws. Use them to communicate and reason about the validity of modes, value ranges, and categories.
Figure 5: Visualize data distributions over collection time to detect paradigm shifts. Use them to communicate and reason about the validity of sudden changes, trends, unexpected patterns and missing data.
...and 5 more figures

Exploratory Visual Analysis for Increasing Data Readiness in Artificial Intelligence Projects

TL;DR

Abstract

Exploratory Visual Analysis for Increasing Data Readiness in Artificial Intelligence Projects

Authors

TL;DR

Abstract

Table of Contents

Figures (10)