Preference Queries over Taxonomic Domains
Paolo Ciaccia, Davide Martinenghi, Riccardo Torlone
TL;DR
The paper tackles retrieving the best data items when user preferences are expressed over taxonomic domains and may clash or be at different granularities. It models data as t-relations with taxonomies and expresses preferences through first-order formulas, then rewrites them with two operators: Transitive Closure $\textsf{T}$ to ensure sound transitivity and Specificity-based Refinement $\textsf{S}$ to resolve conflicts. It proves that $\textsf{T}$ and $\textsf{S}$ do not commute and that no single sequence guarantees both full transitivity and conflict-free results, but identifies two minimal-transitive sequences, $\textsf{T}\textsf{S}\textsf{T}$ and $\textsf{S}\textsf{T}\textsf{S}\textsf{T}$, along with a heuristic method to select the best outcomes. Experiments on synthetic and real datasets show low rewriting overhead, substantial pruning of the candidate set, and important speedups when using the proposed heuristics, validating the practical feasibility of the approach. The work advances query processing over taxonomic domains by providing a principled treatment of specificity and transitivity in preferences and a scalable strategy for best-result computation.
Abstract
When composing multiple preferences characterizing the most suitable results for a user, several issues may arise. Indeed, preferences can be partially contradictory, suffer from a mismatch with the level of detail of the actual data, and even lack natural properties such as transitivity. In this paper we formally investigate the problem of retrieving the best results complying with multiple preferences expressed in a logic-based language. Data are stored in relational tables with taxonomic domains, which allow the specification of preferences also over values that are more generic than those in the database. In this framework, we introduce two operators that rewrite preferences for enforcing the important properties of transitivity, which guarantees soundness of the result, and specificity, which solves all conflicts among preferences. Although, as we show, these two properties cannot be fully achieved together, we use our operators to identify the only two alternatives that ensure transitivity and minimize the residual conflicts. Building on this finding, we devise a technique, based on an original heuristics, for selecting the best results according to the two possible alternatives. We finally show, with a number of experiments over both synthetic and real-world datasets, the effectiveness and practical feasibility of the overall approach.
