Max-Min Diversification with Asymmetric Distances
Iiro Kumpulainen, Florian Adriaens, Nikolaj Tatti
TL;DR
This paper extends Max-Min Diversification to asymmetric distances, defining AMMD on a complete digraph with directed distances that satisfy the triangle inequality. It introduces a combinatorial $\frac{1}{6k}$-approximation framework—BAC, BCR, and BCF—built on a ball-covering clustering and maximum antichain structure, with running times comparable to matrix multiplication and several practical speedups. The authors connect AMMD to the maximum antichain problem, leverage clustering to guarantee extractable independent sets, and provide both theoretical guarantees and empirical validation on real-world and synthetic datasets, including a FlightsDelay-based study. The work demonstrates that even in asymmetric settings, nontrivial approximation guarantees are attainable and effective in practice, while highlighting open questions about tighter hardness results and potential constant-factor AMMD algorithms.
Abstract
One of the most well-known and simplest models for diversity maximization is the Max-Min Diversification (MMD) model, which has been extensively studied in the data mining and database literature. In this paper, we initiate the study of the Asymmetric Max-Min Diversification (AMMD) problem. The input is a positive integer $k$ and a complete digraph over $n$ vertices, together with a nonnegative distance function over the edges obeying the directed triangle inequality. The objective is to select a set of $k$ vertices, which maximizes the smallest pairwise distance between them. AMMD reduces to the well-studied MMD problem in case the distances are symmetric, and has natural applications to query result diversification, web search, and facility location problems. Although the MMD problem admits a simple $\frac{1}{2}$-approximation by greedily selecting the next-furthest point, this strategy fails for AMMD and it remained unclear how to design good approximation algorithms for AMMD. We propose a combinatorial $\frac{1}{6k}$-approximation algorithm for AMMD by leveraging connections with the Maximum Antichain problem. We discuss several ways of speeding up the algorithm and compare its performance against heuristic baselines on real-life and synthetic datasets.
