Table of Contents
Fetching ...

Sequential Diversification with Provable Guarantees

Honglian Wang, Sijing Tu, Aristides Gionis

TL;DR

This work formalizes sequential diversity, a ranking-aware diversification framework that incorporates user behavior through continuation probabilities $p_i$ and a diversity measure $\mathcal{D}(\cdot)$. It introduces two sequential diversity objectives, $\mathcal{S}_{+}$ (sum diversity) and $\mathcal{S}_{c}$ (coverage diversity), and studies their maximization via Max-SSD and Max-SCD. The authors prove NP-hardness of Max-SSD, connect the problem to an ordered Hamiltonian-path formulation, and present constant-factor approximation algorithms, including the Best-$\tau$ items and Greedy-matching methods, with guarantees under uniform and non-uniform continuation probabilities. Empirical results on seven public datasets show competitive performance of the proposed methods against strong baselines, with clear trade-offs between accuracy, coverage, and user engagement metrics such as Exp-DCG and Exp-Serendipity.

Abstract

Diversification is a useful tool for exploring large collections of information items. It has been used to reduce redundancy and cover multiple perspectives in information-search settings. Diversification finds applications in many different domains, including presenting search results of information-retrieval systems and selecting suggestions for recommender systems. Interestingly, existing measures of diversity are defined over \emph{sets} of items, rather than evaluating \emph{sequences} of items. This design choice comes in contrast with commonly-used relevance measures, which are distinctly defined over sequences of items, taking into account the ranking of items. The importance of employing sequential measures is that information items are almost always presented in a sequential manner, and during their information-exploration activity users tend to prioritize items with higher~ranking. In this paper, we study the problem of \emph{maximizing sequential diversity}. This is a new measure of \emph{diversity}, which accounts for the \emph{ranking} of the items, and incorporates \emph{item relevance} and \emph{user behavior}. The overarching framework can be instantiated with different diversity measures, and here we consider the measures of \emph{sum~diversity} and \emph{coverage~diversity}. The problem was recently proposed by Coppolillo et al.~\citep{coppolillo2024relevance}, where they introduce empirical methods that work well in practice. Our paper is a theoretical treatment of the problem: we establish the problem hardness and present algorithms with constant approximation guarantees for both diversity measures we consider. Experimentally, we demonstrate that our methods are competitive against strong baselines.

Sequential Diversification with Provable Guarantees

TL;DR

This work formalizes sequential diversity, a ranking-aware diversification framework that incorporates user behavior through continuation probabilities and a diversity measure . It introduces two sequential diversity objectives, (sum diversity) and (coverage diversity), and studies their maximization via Max-SSD and Max-SCD. The authors prove NP-hardness of Max-SSD, connect the problem to an ordered Hamiltonian-path formulation, and present constant-factor approximation algorithms, including the Best- items and Greedy-matching methods, with guarantees under uniform and non-uniform continuation probabilities. Empirical results on seven public datasets show competitive performance of the proposed methods against strong baselines, with clear trade-offs between accuracy, coverage, and user engagement metrics such as Exp-DCG and Exp-Serendipity.

Abstract

Diversification is a useful tool for exploring large collections of information items. It has been used to reduce redundancy and cover multiple perspectives in information-search settings. Diversification finds applications in many different domains, including presenting search results of information-retrieval systems and selecting suggestions for recommender systems. Interestingly, existing measures of diversity are defined over \emph{sets} of items, rather than evaluating \emph{sequences} of items. This design choice comes in contrast with commonly-used relevance measures, which are distinctly defined over sequences of items, taking into account the ranking of items. The importance of employing sequential measures is that information items are almost always presented in a sequential manner, and during their information-exploration activity users tend to prioritize items with higher~ranking. In this paper, we study the problem of \emph{maximizing sequential diversity}. This is a new measure of \emph{diversity}, which accounts for the \emph{ranking} of the items, and incorporates \emph{item relevance} and \emph{user behavior}. The overarching framework can be instantiated with different diversity measures, and here we consider the measures of \emph{sum~diversity} and \emph{coverage~diversity}. The problem was recently proposed by Coppolillo et al.~\citep{coppolillo2024relevance}, where they introduce empirical methods that work well in practice. Our paper is a theoretical treatment of the problem: we establish the problem hardness and present algorithms with constant approximation guarantees for both diversity measures we consider. Experimentally, we demonstrate that our methods are competitive against strong baselines.

Paper Structure

This paper contains 31 sections, 29 theorems, 23 equations, 10 tables, 3 algorithms.

Key Result

Lemma 1

The sequential sum diversity objective in Equation (eq:osdo) can be reformulated as

Theorems & Definitions (35)

  • Definition 1: Sequential diversity ($\mathcal{S}$)
  • Definition 2: Sequential sum diversity ($\mathcal{S}_{+}$)
  • Definition 3: Sequential coverage diversity ($\mathcal{S}_{c}$)
  • Lemma 1
  • Example 1
  • Theorem 1
  • Definition 4: Ordered submodularity kleinberg2022ordered
  • Definition 5: Ordered Hamiltonian path ($\mathcal{H}$)
  • Theorem 2
  • Corollary 1
  • ...and 25 more