Personalized Top-k Set Queries Over Predicted Scores
Sohrab Namazi Nia, Subhodeep Ghosh, Senjuti Basu Roy, Sihem Amer-Yahia
TL;DR
The paper tackles personalized top-$k$ set queries when scoring functions are user-defined and partially computed by expensive external oracles. It introduces a four-task framework that computes score bounds, builds probabilistic models of candidate winners, selects the next oracle query by maximizing entropy reduction, and processes oracle responses to tighten bounds. By decomposing the scoring function into construct-predictable components and using max-convolution for likelihoods, the approach achieves exact top-$k$ results while reducing oracle calls by an order of magnitude. Extensive experiments on multimodal datasets (Hotels, Movies, Yelp) demonstrate both accuracy and scalability, with probabilistic variants offering favorable performance trade-offs. The work significantly advances efficient, oracle-based top-$k$ retrieval in multimodal, personalized settings and opens avenues for extensions to multiple questions, response types, and rank-aware queries.
Abstract
This work studies the applicability of expensive external oracles such as large language models in answering top-k queries over predicted scores. Such scores are incurred by user-defined functions to answer personalized queries over multi-modal data. We propose a generic computational framework that handles arbitrary set-based scoring functions, as long as the functions could be decomposed into constructs, each of which sent to an oracle (in our case an LLM) to predict partial scores. At a given point in time, the framework assumes a set of responses and their partial predicted scores, and it maintains a collection of possible sets that are likely to be the true top-k. Since calling oracles is costly, our framework judiciously identifies the next construct, i.e., the next best question to ask the oracle so as to maximize the likelihood of identifying the true top-k. We present a principled probabilistic model that quantifies that likelihood. We study efficiency opportunities in designing algorithms. We run an evaluation with three large scale datasets, scoring functions, and baselines. Experiments indicate the efficacy of our framework, as it achieves an order of magnitude improvement over baselines in requiring LLM calls while ensuring result accuracy. Scalability experiments further indicate that our framework could be used in large-scale applications.
