Learning Approximation Sets for Exploratory Queries

Susan B. Davidson; Tova Milo; Kathy Razmadze; Gal Zeevi

Learning Approximation Sets for Exploratory Queries

Susan B. Davidson, Tova Milo, Kathy Razmadze, Gal Zeevi

TL;DR

The paper tackles the challenge of fast, accurate exploration in large databases by formulating Approximate Non-Aggregates Query Processing (ANAQP), which is proven to be $NP$-complete. It introduces ASQP-RL, a reinforcement-learning framework that builds an approximation set of tuples via offline preprocessing and an actor-critic policy trained with Proximal Policy Optimization, tailored to a tabular data domain and unknown workloads. The approach uses a problem-specific score to guide learning, supports drift detection and fine-tuning, and includes a lighter variant for faster setup. Empirical results on two benchmarks show that ASQP-RL achieves up to 30% higher accuracy and 10–35x speedups over baselines, with competitive performance on aggregate queries, underscoring the potential of RL for data management in exploratory querying. This work points to practical, scalable improvements for data exploration tasks and suggests directions for further refinement, such as richer dataset statistics and more explicit diversification guarantees.

Abstract

In data exploration, executing complex non-aggregate queries over large databases can be time-consuming. Our paper introduces a novel approach to address this challenge, focusing on finding an optimized subset of data, referred to as the approximation set, for query execution. The goal is to maximize query result quality while minimizing execution time. We formalize this problem as Approximate Non-Aggregates Query Processing (ANAQP) and establish its NP-completeness. To tackle this, we propose an approximate solution using advanced Reinforcement Learning architecture, termed ASQP-RL. This approach overcomes challenges related to the large action space and the need for generalization beyond a known query workload. Experimental results on two benchmarks demonstrate the superior performance of ASQP-RL, outperforming baselines by 30% in accuracy and achieving efficiency gains of 10-35X. Our research sheds light on the potential of reinforcement learning techniques for advancing data management tasks. Experimental results on two benchmarks show that ASQP-RL significantly outperforms the baselines both in terms of accuracy (30% better) and efficiency (10-35X). This research provides valuable insights into the potential of RL techniques for future advancements in data management tasks.

Learning Approximation Sets for Exploratory Queries

TL;DR

The paper tackles the challenge of fast, accurate exploration in large databases by formulating Approximate Non-Aggregates Query Processing (ANAQP), which is proven to be

-complete. It introduces ASQP-RL, a reinforcement-learning framework that builds an approximation set of tuples via offline preprocessing and an actor-critic policy trained with Proximal Policy Optimization, tailored to a tabular data domain and unknown workloads. The approach uses a problem-specific score to guide learning, supports drift detection and fine-tuning, and includes a lighter variant for faster setup. Empirical results on two benchmarks show that ASQP-RL achieves up to 30% higher accuracy and 10–35x speedups over baselines, with competitive performance on aggregate queries, underscoring the potential of RL for data management in exploratory querying. This work points to practical, scalable improvements for data exploration tasks and suggests directions for further refinement, such as richer dataset statistics and more explicit diversification guarantees.

Abstract

Paper Structure (47 sections, 6 equations, 4 figures, 3 algorithms)

This paper contains 47 sections, 6 equations, 4 figures, 3 algorithms.

Introduction
Challenges
Our solution
Related work
OLAP
Approximate Query Processing (AQP) and Generative Models
Data Summarization and Sampling
Data Reduction, Sketches and others
Problem Definition and Complexity
ANAQP Problem Definition
Problem Hardness
Unknown Query Workloads
Approximation using RL
Overview of ASQP-RL
Data and Query Pre-processing
...and 32 more sections

Figures (4)

Figure 1: ASQP-RL architecture.
Figure 2: Quality and Running time
Figure 3: Reinforcement Learning Ablation Study
Figure :

Learning Approximation Sets for Exploratory Queries

TL;DR

Abstract

Learning Approximation Sets for Exploratory Queries

Authors

TL;DR

Abstract

Table of Contents

Figures (4)