De-cluttering Scatterplots with Integral Images
Hennes Rave, Vladimir Molchanov, Lars Linsen
TL;DR
This work tackles overplotting in scatterplots by introducing a data-driven, global domain deformation that yields a density-equalized, near-uniform distribution of samples while preserving local neighborhood relations. The method builds and iteratively applies a deformation map derived from integral-image representations of a rasterized density, with a corrective term ensuring identity behavior under uniform density. A GPU-accelerated pipeline computes the required integral images and deformation efficiently, enabling interactive visual analysis of large datasets. The authors also explore visual encodings of the deformation (grid, density background, contours) and validate the approach through numerical benchmarks and a user study, demonstrating improved task performance over traditional opacity-based clutter reduction in many scenarios. The technique offers a scalable, deterministic alternative to existing clutter-reduction methods and opens avenues for applications like local lenses and contiguous cartograms.
Abstract
Scatterplots provide a visual representation of bivariate data (or 2D embeddings of multivariate data) that allows for effective analyses of data dependencies, clusters, trends, and outliers. Unfortunately, classical scatterplots suffer from scalability issues, since growing data sizes eventually lead to overplotting and visual clutter on a screen with a fixed resolution, which hinders the data analysis process. We propose an algorithm that compensates for irregular sample distributions by a smooth transformation of the scatterplot's visual domain. Our algorithm evaluates the scatterplot's density distribution to compute a regularization mapping based on integral images of the rasterized density function. The mapping preserves the samples' neighborhood relations. Few regularization iterations suffice to achieve a nearly uniform sample distribution that efficiently uses the available screen space. We further propose approaches to visually convey the transformation that was applied to the scatterplot and compare them in a user study. We present a novel parallel algorithm for fast GPU-based integral-image computation, which allows for integrating our de-cluttering approach into interactive visual data analysis systems.
