Table of Contents
Fetching ...

Jupyter Scatter: Interactive Exploration of Large-Scale Datasets

Fritz Lekschas, Trevor Manz

TL;DR

The paper addresses the challenge of interactively exploring large bivariate datasets in notebook environments. It introduces Jupyter Scatter, a WebGL-based, interlinked scatterplot widget that renders millions of points, supports two-way zoom and selections, and can synchronize multiple plots, with tight integration to Pandas and Matplotlib and a cross-platform anywidget-based frontend. Key contributions include scalable GPU-accelerated rendering, density-aware opacity, rich interaction features (legends, tooltips, selections), and a composition API for comparing embeddings and datasets. This work enables practical, scalable, and interactive data exploration and dashboarding in notebooks and IDEs, reducing setup and enabling domain-specific visualization apps.

Abstract

Jupyter Scatter is a scalable, interactive, and interlinked scatterplot widget for exploring datasets in Jupyter Notebook/Lab, Colab, and VS Code. Its goal is to simplify the visual exploration, analysis, and comparison of large-scale bivariate datasets. Jupyter Scatter can render up to twenty million points, supports fast point selections, integrates with Pandas DataFrame and Matplotlib, uses perceptually-effective default settings, and offers a user-friendly API.

Jupyter Scatter: Interactive Exploration of Large-Scale Datasets

TL;DR

The paper addresses the challenge of interactively exploring large bivariate datasets in notebook environments. It introduces Jupyter Scatter, a WebGL-based, interlinked scatterplot widget that renders millions of points, supports two-way zoom and selections, and can synchronize multiple plots, with tight integration to Pandas and Matplotlib and a cross-platform anywidget-based frontend. Key contributions include scalable GPU-accelerated rendering, density-aware opacity, rich interaction features (legends, tooltips, selections), and a composition API for comparing embeddings and datasets. This work enables practical, scalable, and interactive data exploration and dashboarding in notebooks and IDEs, reducing setup and enabling domain-specific visualization apps.

Abstract

Jupyter Scatter is a scalable, interactive, and interlinked scatterplot widget for exploring datasets in Jupyter Notebook/Lab, Colab, and VS Code. Its goal is to simplify the visual exploration, analysis, and comparison of large-scale bivariate datasets. Jupyter Scatter can render up to twenty million points, supports fast point selections, integrates with Pandas DataFrame and Matplotlib, uses perceptually-effective default settings, and offers a user-friendly API.
Paper Structure (5 sections, 3 figures)

This paper contains 5 sections, 3 figures.

Figures (3)

  • Figure 1: Examples of Jupyter Scatter. Top row left to right: A 10M point scatterplot of the Roessler Attractor. A connected scatterplot of the market capitalization over the last five years of top ten S&P500 company according to YCharts. Five linked embedding plots of epigenomic data dekker2023spatial that are connected to the HiGlass genome browser kerpedjiev2018higlass. Bottom row left to right: A single-cell embedding plot of tumor data mair2022extricating that was clustered and annotated with FAUST greene2022data. Several linked embedding plots of chromatin state datasets spracklin2023diverse. An embedding plot of news headlines misra2022news that is linked to a widget for displaying selected articles.
  • Figure 2: GeoNames Dataset of Cities Around the World.
  • Figure 3: Fashion MNIST Embeddings. Left: Integration of Jupyter Scatter with an image widget through synchronized point selections. Right: Four scatterplots with synchronized point selection.