Table of Contents
Fetching ...

ListK: Semantic ORDER BY and LIMIT K with Listwise Prompting

Jason Shin, Jiwon Chang, Fatemeh Nargesian

Abstract

Semantic operators abstract large language model (LLM) calls in SQL clauses. It is gaining traction as an easy method to analyze semi-structured, unstructured, and multimodal datasets. While a plethora of recent works optimize various semantic operators, existing methods for semantic ORDER BY (full sort) and LIMIT K (top-K) remain lackluster. Our ListK framework improves the latency of semantic ORDER BY ... LIMIT K at no cost to accuracy. Motivated by the recent advance in fine-tuned listwise rankers, we study several sorting algorithms that best combine partial listwise rankings. These include: 1) deterministic listwise tournament (LTTopK), 2) Las Vegas and embarrassingly parallel listwise multi-pivot quickselect/sort (LMPQSelect, LMPQSort), and 3) a basic Monte Carlo listwise tournament filter (LTFilter). Of these, listwise multi-pivot quickselect/sort are studied here for the first time. The full framework provides a query optimizer for combining the above physical operators based on the target recall to minimize latency. We provide theoretical analysis to easily tune parameters and provide cost estimates for query optimizers. ListK empirically dominates the Pareto frontier, halving latency at virtually no cost to recall and NDCG compared to prior art.

ListK: Semantic ORDER BY and LIMIT K with Listwise Prompting

Abstract

Semantic operators abstract large language model (LLM) calls in SQL clauses. It is gaining traction as an easy method to analyze semi-structured, unstructured, and multimodal datasets. While a plethora of recent works optimize various semantic operators, existing methods for semantic ORDER BY (full sort) and LIMIT K (top-K) remain lackluster. Our ListK framework improves the latency of semantic ORDER BY ... LIMIT K at no cost to accuracy. Motivated by the recent advance in fine-tuned listwise rankers, we study several sorting algorithms that best combine partial listwise rankings. These include: 1) deterministic listwise tournament (LTTopK), 2) Las Vegas and embarrassingly parallel listwise multi-pivot quickselect/sort (LMPQSelect, LMPQSort), and 3) a basic Monte Carlo listwise tournament filter (LTFilter). Of these, listwise multi-pivot quickselect/sort are studied here for the first time. The full framework provides a query optimizer for combining the above physical operators based on the target recall to minimize latency. We provide theoretical analysis to easily tune parameters and provide cost estimates for query optimizers. ListK empirically dominates the Pareto frontier, halving latency at virtually no cost to recall and NDCG compared to prior art.
Paper Structure (88 sections, 6 theorems, 12 equations, 8 figures, 2 tables, 5 algorithms)

This paper contains 88 sections, 6 theorems, 12 equations, 8 figures, 2 tables, 5 algorithms.

Key Result

Theorem 1

Constrain $P = P'$. Then for sufficiently large $N$:

Figures (8)

  • Figure 1: A visualization of our overall strategy. (Left) An example of how listwise prompting is used to extract rankings from an LLM. (Right) Four physical plans supported by ListK: LMPQSelect/Sort and LTTopK with optional LTFilter.
  • Figure 2: A visualization of the four algorithms we study applied to the same toy problem. Across all algorithms, corpus size $N = 6$, list size $L = 3$, and $K = 2$. Numbers in a set $\{a, b\}$ are unsorted whereas numbers in a tuple $(a, b)$ are. The $<$ symbol indicates that an explicit ordering has been established via a listwise ranker.
  • Figure 3: Ideal Monte-Carlo simulation of LMPQSelect and LMPQSort compared to the asymptotic theoretical estimate. Cost is measured in terms of the number of listwise ranker calls; rankers are assumed to be perfect. 5000 trials for each parameter configuration. For quickselect, we test varying values of $K$, where $\psi = K/N$ ratio ranges from 0.001 to 0.5. Note that y axis is truncated at very large values.
  • Figure 4: End-to-end comparison of our methods (LTFilter+LMPQ, Tournament Top-K, LMPQ) with baseline methods on SciFact. All methods were tested on 25 test queries of the SciFact dataset with $N=5,183$ and $K=10$ for the selection and selection+sorting problems, w.r.t. mean accuracy (recall, NDCG) and latency. Our methods dominate the Pareto frontier with the exception of LOTUSTopK with Qwen3-8B on the extremely expensive end of the spectrum. Note the log scale on the $x$-axis. For LTFilter, $S$ = 1, 5, 10, and 15 were tested with a darker shade indicating a larger $S$ value.
  • Figure 5: (\ref{['fig:expr:tfilter']}) The impact of survivor count per bin ($S$) and $K$ on the recall of LTFilter; ground truth labels determined by LLM-as-judge ($1 \le K \le 50$). (\ref{['fig:expr:window:latency']}, \ref{['fig:expr:window:recall']}) The imapct of list size ($L$) on the latency and recall of LMPQSelect. \ref{['fig:expr:window:latency']}'s best fit is a linear model with a reciprocal ($1/L$) term; \ref{['fig:expr:window:recall']}'s best fit is a quadratic model. All experiments were run on 25 queries of the scifact dataset with $N=5,184$ with $P=P'=1$ against human labels ($R=1$) and LLM-as-judge labels ($R=10$).
  • ...and 3 more figures

Theorems & Definitions (8)

  • Remark 1: Cost of listwise tournament top-K
  • Theorem 1: Expected cost of listwise quicksort
  • Corollary 1.1
  • Theorem 2: Expected complexity of listwise quickselect
  • Remark 2: Expected recall of listwise tournament filter
  • Proposition 2.1
  • Proposition 2.2
  • Proposition 2.3