Diversity of Answers to Conjunctive Queries
Timo Camillo Merkl, Reinhard Pichler, Sebastian Skritek
TL;DR
The paper studies Diverse-$\mathcal{Q}$: selecting a size-$k$ diversity set of answers to a CQ (and extensions) such that a diversity measure $\delta$ (built from pairwise Hamming distances via a polynomial-time aggregator $f$) reaches at least $d$. It provides a detailed parameterized complexity landscape across query classes: for acyclic CQs (ACQ) with diverse-set size $k$, Diverse-$\mathcal{ACQ}$ is in XP for combined complexity and becomes FPT in data complexity; a W[1]-hardness result holds for ws-monotone measures. Extensions to unions of CQs (UCQ/UACQ) and CQs with negation (CQ$^\neg$) show a mix of tractability and hardness, with Diverse-wsm-UACQ being NP-hard even for $k=2$, while Diverse-sum-ACQ remains FPT in query complexity and efficient in data settings. The work also outlines algorithmic strategies based on Yannakakis-style join-tree DP, and discusses structural width measures (treewidth, hypertree width) and their impact on tractability, as well as future directions like smw-bounded queries, beta-acyclicity, and approximation approaches. Overall, the results map the computational boundaries of producing diverse CQ-answer subsets, informing both exact and heuristic diversification approaches in practice.
Abstract
Enumeration problems aim at outputting, without repetition, the set of solutions to a given problem instance. However, outputting the entire solution set may be prohibitively expensive if it is too big. In this case, outputting a small, sufficiently diverse subset of the solutions would be preferable. This leads to the Diverse-version of the original enumeration problem, where the goal is to achieve a certain level d of diversity by selecting k solutions. In this paper, we look at the Diverse-version of the query answering problem for Conjunctive Queries and extensions thereof. That is, we study the problem if it is possible to achieve a certain level d of diversity by selecting k answers to the given query and, in the positive case, to actually compute such k answers.
