Jupyter Scatter: Interactive Exploration of Large-Scale Datasets
Fritz Lekschas, Trevor Manz
TL;DR
The paper addresses the challenge of interactively exploring large bivariate datasets in notebook environments. It introduces Jupyter Scatter, a WebGL-based, interlinked scatterplot widget that renders millions of points, supports two-way zoom and selections, and can synchronize multiple plots, with tight integration to Pandas and Matplotlib and a cross-platform anywidget-based frontend. Key contributions include scalable GPU-accelerated rendering, density-aware opacity, rich interaction features (legends, tooltips, selections), and a composition API for comparing embeddings and datasets. This work enables practical, scalable, and interactive data exploration and dashboarding in notebooks and IDEs, reducing setup and enabling domain-specific visualization apps.
Abstract
Jupyter Scatter is a scalable, interactive, and interlinked scatterplot widget for exploring datasets in Jupyter Notebook/Lab, Colab, and VS Code. Its goal is to simplify the visual exploration, analysis, and comparison of large-scale bivariate datasets. Jupyter Scatter can render up to twenty million points, supports fast point selections, integrates with Pandas DataFrame and Matplotlib, uses perceptually-effective default settings, and offers a user-friendly API.
