Table of Contents
Fetching ...

Accessible Color Sequences for Data Visualization

Matthew A. Petroff

TL;DR

This work addresses the need for color sequences in data visualization that are both aesthetically pleasing and accessible to individuals with color-vision deficiencies. It combines a crowdsourced aesthetic-preference framework with rigorous perceptual constraints in CAM02-UCS and deficiency simulations to generate six-, eight-, and ten-color sequences, including considerations for grayscale readability and color naming. A conjoined neural-network approach trained on pairwise survey data yields scores used to select near-optimal sequences, which are further refined by a sequence-accessibility metric that accounts for perceptual and lightness distances. The resulting color sequences, shown to outperform many common defaults in accessibility, provide robust defaults for plotting libraries while remaining interpretable and describable in verbal and written descriptions.

Abstract

Color sequences, ordered sets of colors for data visualization, that balance aesthetics with accessibility considerations are presented. In order to model aesthetic preference, data were collected with an online survey, and the results were used to train a machine-learning model. To ensure accessibility, this model was combined with minimum-perceptual-distance constraints, including for simulated color-vision deficiencies, as well as with minimum-lightness-distance constraints for grayscale printing, maximum-lightness constraints for maintaining contrast with a white background, and scores from a color-saliency model for ease of use of the colors in verbal and written descriptions. Optimal color sequences containing six, eight, and ten colors were generated using the data-driven aesthetic-preference model and accessibility constraints. Due to the balance of aesthetics and accessibility considerations, the resulting color sequences can serve as reasonable defaults in data-plotting codes, e.g., for use in scatter plots and line plots.

Accessible Color Sequences for Data Visualization

TL;DR

This work addresses the need for color sequences in data visualization that are both aesthetically pleasing and accessible to individuals with color-vision deficiencies. It combines a crowdsourced aesthetic-preference framework with rigorous perceptual constraints in CAM02-UCS and deficiency simulations to generate six-, eight-, and ten-color sequences, including considerations for grayscale readability and color naming. A conjoined neural-network approach trained on pairwise survey data yields scores used to select near-optimal sequences, which are further refined by a sequence-accessibility metric that accounts for perceptual and lightness distances. The resulting color sequences, shown to outperform many common defaults in accessibility, provide robust defaults for plotting libraries while remaining interpretable and describable in verbal and written descriptions.

Abstract

Color sequences, ordered sets of colors for data visualization, that balance aesthetics with accessibility considerations are presented. In order to model aesthetic preference, data were collected with an online survey, and the results were used to train a machine-learning model. To ensure accessibility, this model was combined with minimum-perceptual-distance constraints, including for simulated color-vision deficiencies, as well as with minimum-lightness-distance constraints for grayscale printing, maximum-lightness constraints for maintaining contrast with a white background, and scores from a color-saliency model for ease of use of the colors in verbal and written descriptions. Optimal color sequences containing six, eight, and ten colors were generated using the data-driven aesthetic-preference model and accessibility constraints. Due to the balance of aesthetics and accessibility considerations, the resulting color sequences can serve as reasonable defaults in data-plotting codes, e.g., for use in scatter plots and line plots.

Paper Structure

This paper contains 19 sections, 4 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Color survey sets interface. The survey respondent was presented with two separate color sets and asked to choose the more pleasing set; after this step, four possible orderings were presented, and the respondent was asked to choose the most pleasing ordering. The plot rendering was randomly changed between a line plot and a scatter plot, and different line thicknesses and marker sizes were used. The rendering was not shown on smaller screens and was not shown when picking orderings.
  • Figure 2: Artificial-neural-network architecture overview. Shading denotes nodes with trainable parameters, and the numbers next to nodes denote their output dimensions. The 1D separable convolutions use a kernel of size 5, and the inputs are zero-padded before the kernel is applied such that the output shape is identical to the input shape. For the set model, the input colors were ordered along one of the three CAM02-UCS axes, and a separate copy of the model was independently trained for each ordering; the outputs of the three copies were then averaged. For the sequence model, the input colors were ordered per the sequence ordering.
  • Figure 3: Best and worst color sets. The ten highest-scored and ten lowest-scored color sets with six, eight, and ten colors, per the metric described in the text, are shown, starting with the set with the best score. Also shown is the color set, of the 10k randomly generated, with the maximum minimum-perceptual distance for each set length. Each color set is ordered by hue angle; ordering data were only collected for the "better" set in the survey, so the ordering model is less constained for the "worst" sets, so it is not used in this comparison.
  • Figure 4: Effect of scatter-plot marker lightness on response time. The average response time in milliseconds is shown for markers of various $L^*$, with error bars denoting the standard error. The horizontal gray band shows the overall mean, with a 1-$\sigma$ confidence interval. The data show that the use of colors with $L^*>84.6$ should be avoided to maintain scatter-plot readability.