VIDEX: A Disaggregated and Extensible Virtual Index for the Cloud and AI Era
Rong Kang, Shuai Wang, Tieying Zhang, Xianghong Xu, Linhui Xu, Zhimin Liang, Lei Zhang, Rui Shi, Jianjun Chen
TL;DR
VIDEX tackles cloud-native constraints on database optimization and AI-model integration by introducing a disaggregated, three-layer architecture that separates production data, a virtual-index optimizer, and an AI-enabled statistic server. By exposing standardized interfaces for AI-based cardinality and NDV estimation and enabling data-less optimization via the VIDEX-Optimizer, VIDEX achieves production-like query plans while preserving data privacy. Experimental results on representative workloads show high fidelity to production plans (average q-error $<1.1$) and scalable deployment across thousands of MySQL instances, with an open-source release to spur broader adoption and extension to other DBMSs like PostgreSQL. The approach enables flexible, GPU-accelerated AI optimization and on-demand model updates, representing a practical path for AI-driven optimization in cloud-native database systems.
Abstract
Virtual index, also known as hypothetical indexes, play a crucial role in database query optimization. However, with the rapid advancement of cloud computing and AI-driven models for database optimization, traditional virtual index approaches face significant challenges. Cloud-native environments often prohibit direct conducting query optimization process on production databases due to stability requirements and data privacy concerns. Moreover, while AI models show promising progress, their integration with database systems poses challenges in system complexity, inference acceleration, and model hot updates. In this paper, we present VIDEX, a three-layer disaggregated architecture that decouples database instances, the virtual index optimizer, and algorithm services, providing standardized interfaces for AI model integration. Users can configure VIDEX by either collecting production statistics or by loading from a prepared file; this setup allows for high-accurate what-if analyses based on virtual indexes, achieving query plans that are identical to those of the production instance. Additionally, users can freely integrate new AI-driven algorithms into VIDEX. VIDEX has been successfully deployed at ByteDance, serving thousands of MySQL instances daily and over millions of SQL queries for index optimization tasks.
