Table of Contents
Fetching ...

To Explore What Isn't There -- Glyph-based Visualization for Analysis of Missing Values

Sara Johansson Fernstad, Jimmy Johansson

TL;DR

This work addresses the visualization of missing data by introducing the Missingness Glyph (MissiG), a glyph-based method that encodes the three key missingness patterns: Amount Missing ($AM$), Joint Missingness ($JM$), and Conditional Missingness ($CM$). MissiG supports standalone linear and radial layouts and can be added as an enhancement to Heatmap or Parallel Coordinates, facilitating exploration of missingness structure and relationships with recorded values. Two usability studies demonstrate that MissiG often outperforms or matches baseline methods in identifying AM and JM patterns, while CM patterns show mixed results; overall, MissiG is well liked and commonly preferred for missingness analysis. The approach offers practical value for data quality assessment, imputation planning, and data preprocessing decisions and points to future work in IoT/sensor data and broader visualization integration.

Abstract

This paper contributes a novel visualization method, Missingness Glyph, for analysis and exploration of missing values in data. Missing values are a common challenge in most data generating domains and may cause a range of analysis issues. Missingness in data may indicate potential problems in data collection and pre-processing, or highlight important data characteristics. While the development and improvement of statistical methods for dealing with missing data is a research area in its own right, mainly focussing on replacing missing values with estimated values, considerably less focus has been put on visualization of missing values. Nonetheless, visualization and explorative analysis has great potential to support understanding of missingness in data, and to enable gaining of novel insights into patterns of missingness in a way that statistical methods are unable to. The Missingness Glyph supports identification of relevant missingness patterns in data, and is evaluated and compared to two other visualization methods in context of the missingness patterns. The results are promising and confirms that the Missingness Glyph in several cases perform better than the alternative visualization methods.

To Explore What Isn't There -- Glyph-based Visualization for Analysis of Missing Values

TL;DR

This work addresses the visualization of missing data by introducing the Missingness Glyph (MissiG), a glyph-based method that encodes the three key missingness patterns: Amount Missing (), Joint Missingness (), and Conditional Missingness (). MissiG supports standalone linear and radial layouts and can be added as an enhancement to Heatmap or Parallel Coordinates, facilitating exploration of missingness structure and relationships with recorded values. Two usability studies demonstrate that MissiG often outperforms or matches baseline methods in identifying AM and JM patterns, while CM patterns show mixed results; overall, MissiG is well liked and commonly preferred for missingness analysis. The approach offers practical value for data quality assessment, imputation planning, and data preprocessing decisions and points to future work in IoT/sensor data and broader visualization integration.

Abstract

This paper contributes a novel visualization method, Missingness Glyph, for analysis and exploration of missing values in data. Missing values are a common challenge in most data generating domains and may cause a range of analysis issues. Missingness in data may indicate potential problems in data collection and pre-processing, or highlight important data characteristics. While the development and improvement of statistical methods for dealing with missing data is a research area in its own right, mainly focussing on replacing missing values with estimated values, considerably less focus has been put on visualization of missing values. Nonetheless, visualization and explorative analysis has great potential to support understanding of missingness in data, and to enable gaining of novel insights into patterns of missingness in a way that statistical methods are unable to. The Missingness Glyph supports identification of relevant missingness patterns in data, and is evaluated and compared to two other visualization methods in context of the missingness patterns. The results are promising and confirms that the Missingness Glyph in several cases perform better than the alternative visualization methods.

Paper Structure

This paper contains 23 sections, 18 figures, 4 tables.

Figures (18)

  • Figure 1: The basic structure of MissiG for three or four variables. Variable C is selected in \ref{['Fig:MissVisC']} and \ref{['Fig:MissVisD']}
  • Figure 2: MissiG with linear layout for a synthetic data set with 6 variables, where $x1$ has no missing values and the remaining variables have 10% -- 30% missing.
  • Figure 3: MissiG with radial layout for the same data as in Fig. \ref{['Fig:MissiG_Linear_6Var']}, with $x3$ selected. Highlighted blocks and bands emphasize JM and red histograms represent CM with missing values in $x3$.
  • Figure 4: PC and Heatmap enhanced with MissiG glyphs. Variable $x5$ is selected in both figures.
  • Figure 5: Visualization of the Kamyr Digester data set with 22 variables and 301 items. BF-CMratio (third variable from the left) is selected and highlighted.
  • ...and 13 more figures