Automated Database Indexing using Model-free Reinforcement Learning
Gabriel Paludo Licks, Felipe Meneguzzi
TL;DR
This work tackles automated, workload-adaptive database indexing by leveraging a model-free reinforcement learning framework (SmartIX). A Deep Q-Network agent operates in a database environment, with a state represented by the concatenation of current index configuration and recent index usage, and actions that flip indexes or do nothing; rewards are shaped to reward beneficial indexings and penalize unnecessary ones. Empirical results on the TPC-H benchmark show that SmartIX achieves near-optimal index configurations with smaller storage footprints and outperforms baselines including genetic algorithms and other RL approaches, while transferring effectively to larger databases. The approach enables dynamic, query-driven index management suitable for cloud and production settings, with limitations noted for composite indexes and write-heavy workloads and clear directions for future work.
Abstract
Configuring databases for efficient querying is a complex task, often carried out by a database administrator. Solving the problem of building indexes that truly optimize database access requires a substantial amount of database and domain knowledge, the lack of which often results in wasted space and memory for irrelevant indexes, possibly jeopardizing database performance for querying and certainly degrading performance for updating. We develop an architecture to solve the problem of automatically indexing a database by using reinforcement learning to optimize queries by indexing data throughout the lifetime of a database. In our experimental evaluation, our architecture shows superior performance compared to related work on reinforcement learning and genetic algorithms, maintaining near-optimal index configurations and efficiently scaling to large databases.
