Gatherplot: A Non-Overlapping Scatterplot
Deokgun Park, Sung-Hee Kim, Niklas Elmqvist
TL;DR
The paper addresses overplotting in scatterplots, especially with categorical or duplicated values, by introducing the Gather Transformation that partitions axes into segments and maps data points within those segments to pack marks without aggregation. This yields gatherplots, a 2D gathering representation with layout modes (Absolute, Normalized, Streamgraph) and a local GatherLens interaction for targeted control, all implemented in a D3/Angular prototype. A crowdsourced study demonstrates that gatherplots improve accuracy and user confidence over jittered scatterplots, with mode choices aligned to specific tasks. The approach preserves object identity, supports continuous variables through binning, and offers practical advantages for multidimensional exploration, with future work extending gathering principles to parallel coordinates and additional interactions.
Abstract
Scatterplots are a common tool for exploring multidimensional datasets, especially in the form of scatterplot matrices (SPLOMs). However, scatterplots suffer from overplotting when categorical variables are mapped to one or two axes, or the same continuous variable is used for both axes. Previous methods such as histograms or violin plots use aggregation, which makes brushing and linking difficult. To address this, we propose gatherplots, an extension of scatterplots to manage the overplotting problem. Gatherplots are a form of unit visualization, which avoid aggregation and maintain the identity of individual objects to ease visual perception. In gatherplots, every visual mark that maps to the same position coalesces to form a packed entity, thereby making it easier to see the overview of data groupings. The size and aspect ratio of marks can also be changed dynamically to make it easier to compare the composition of different groups. In the case of a categorical variable vs. a categorical variable, we propose a heuristic to decide bin sizes for optimal space usage. To validate our work, we conducted a crowdsourced user study that shows that gatherplots enable people to assess data distribution more quickly and more correctly than when using jittered scatterplots.
