PyGWalker: On-the-fly Assistant for Exploratory Visual Data Analysis
Yue Yu, Leixian Shen, Fei Long, Huamin Qu, Hao Chen
TL;DR
PyGWalker tackles the disconnect between programmatic notebook analysis and exploratory visual data analysis by offering an on-the-fly, GUI-assisted workflow. It introduces Graphic-Link, a declarative representation of data, intent, and visualization, and Compute-Link, which decouples view-data computation and supports in-browser JavaScript, Python (via DuckDB), and external databases, rendering via Vega-Lite. The approach enables a seamless, cross-environment exploration workflow and reproducible sharing through JSON specifications and HTML exports, demonstrated through a notebook-based usage scenario. The tool has achieved substantial community uptake (612k downloads, 10.5k GitHub stars by June 2024) and is being integrated into research, education, and application development, highlighting its practical impact.
Abstract
Exploratory visual data analysis tools empower data analysts to efficiently and intuitively explore data insights throughout the entire analysis cycle. However, the gap between common programmatic analysis (e.g., within computational notebooks) and exploratory visual analysis leads to a disjointed and inefficient data analysis experience. To bridge this gap, we developed PyGWalker, a Python library that offers on-the-fly assistance for exploratory visual data analysis. It features a lightweight and intuitive GUI with a shelf builder modality. Its loosely coupled architecture supports multiple computational environments to accommodate varying data sizes. Since its release in February 2023, PyGWalker has gained much attention, with 612k downloads on PyPI and over 10.5k stars on GitHub as of June 2024. This demonstrates its value to the data science and visualization community, with researchers and developers integrating it into their own applications and studies.
