Table of Contents
Fetching ...

Efficient Query Repair for Aggregate Constraints

Shatha Algarni, Boris Glavic, Seokki Lee, Adriane Chapman

TL;DR

The paper addresses repairing a user query so that its result satisfies complex aggregate constraints expressed as arithmetic combinations of aggregates, including non-monotone constraints like fairness measures. It introduces two pruning-based repair frameworks, algff and algrp, that use kd-tree clustering and interval arithmetic to reuse aggregations and bound constraint results, enabling efficient top-k repairs. The authors formalize the problem, prove correctness aspects, and demonstrate substantial runtime gains over brute-force baselines and prior work across diverse datasets and constraints. The work enables enforcing sophisticated constraints in query results (e.g., SPD fairness) without drastically altering user intent, with practical impact for fair and compliant data retrieval.

Abstract

In many real-world scenarios, query results must satisfy domain-specific constraints. For instance, a minimum percentage of interview candidates selected based on their qualifications should be female. These requirements can be expressed as constraints over an arithmetic combination of aggregates evaluated on the result of the query. In this work, we study how to repair a query to fulfill such constraints by modifying the filter predicates of the query. We introduce a novel query repair technique that leverages bounds on sets of candidate solutions and interval arithmetic to efficiently prune the search space. We demonstrate experimentally, that our technique significantly outperforms baselines that consider a single candidate at a time.

Efficient Query Repair for Aggregate Constraints

TL;DR

The paper addresses repairing a user query so that its result satisfies complex aggregate constraints expressed as arithmetic combinations of aggregates, including non-monotone constraints like fairness measures. It introduces two pruning-based repair frameworks, algff and algrp, that use kd-tree clustering and interval arithmetic to reuse aggregations and bound constraint results, enabling efficient top-k repairs. The authors formalize the problem, prove correctness aspects, and demonstrate substantial runtime gains over brute-force baselines and prior work across diverse datasets and constraints. The work enables enforcing sophisticated constraints in query results (e.g., SPD fairness) without drastically altering user intent, with practical impact for fair and compliant data retrieval.

Abstract

In many real-world scenarios, query results must satisfy domain-specific constraints. For instance, a minimum percentage of interview candidates selected based on their qualifications should be female. These requirements can be expressed as constraints over an arithmetic combination of aggregates evaluated on the result of the query. In this work, we study how to repair a query to fulfill such constraints by modifying the filter predicates of the query. We introduce a novel query repair technique that leverages bounds on sets of candidate solutions and interval arithmetic to efficiently prune the search space. We demonstrate experimentally, that our technique significantly outperforms baselines that consider a single candidate at a time.

Paper Structure

This paper contains 24 sections, 6 theorems, 13 equations, 9 figures, 6 tables, 3 algorithms.

Key Result

theorem 1

Given an instance $(Q\xspace, D\xspace, \omega\xspace, k)$ of the aggregate constraint repair problem, and (alg:range_filtering) compute the solution for this problem instance.

Figures (9)

  • Figure 1: Overview of query repair with aggregate constraints using range-based pruning.
  • Figure 2: Runtime, , and for , , and brute force over the dataset.
  • Figure 3: Runtime, , and for and over the and datasets using the queries from \ref{['tab:queries']}.
  • Figure 4: Runtime, , and for and over the and datasets, varying .
  • Figure 5: Runtime, , and for and over the and datasets, varying bucket size $\mathcal{S}$.
  • ...and 4 more figures

Theorems & Definitions (9)

  • Example 1: Fairness Motivating Example
  • Example 2
  • Example 3: Company Product Management
  • theorem 1: Correctness of and
  • lemma 1: \ref{['alg:filter_fully']} Returns Covering Clustersets
  • lemma 2: Aggregate Results on Fully Covering Clustersets
  • lemma 3: Partially Covering Cluster Sets
  • lemma 4: Sound Bounds on Aggregation Results
  • lemma 5: Universal and Existential Constraint Checking is Sound