Optimizing Dataflow Systems for Scalable Interactive Visualization
Junran Yang, Hyekang Kevin Joo, Sai Yerramreddy, Dominik Moritz, Leilani Battle
TL;DR
VegaPlus tackles the problem of scaling interactive visualizations over big data by tightly integrating the Vega visualization language with a back-end DBMS. It introduces a cross-stack optimizer that rewrites Vega dataflow transforms into SQL, and selects efficient execution plans by a pairwise learning-to-rank approach, balancing client-side rendering with server-side processing. The system leverages VegaDBMSTransform nodes, batching and caching to minimize round trips, and uses a benchmark suite of seven templates to validate end-to-end performance gains over standard Vega. Findings indicate VegaPlus provides substantial speedups for large datasets and maintains versatility across diverse dashboard designs, with plan selection that effectively accounts for interactions. The work offers practical impact by enabling analysts to design visualizations without bespoke optimization, while preserving the expressive Vega workflow and enabling broader applicability to other dataflow-based visualization languages.
Abstract
Supporting the interactive exploration of large datasets is a popular and challenging use case for data management systems. Traditionally, the interface and the back-end system are built and optimized separately, and interface design and system optimization require different skill sets that are difficult for one person to master. To enable analysts to focus on visualization design, we contribute VegaPlus, a system that automatically optimizes interactive dashboards to support large datasets. To achieve this, VegaPlus leverages two core ideas. First, we introduce an optimizer that can reason about execution plans in Vega, a back-end DBMS, or a mix of both environments. The optimizer also considers how user interactions may alter execution plan performance, and can partially or fully rewrite the plans when needed. Through a series of benchmark experiments on seven different dashboard designs, our results show that VegaPlus provides superior performance and versatility compared to standard dashboard optimization techniques.
