Table of Contents
Fetching ...

Active Seriation: Efficient Ordering Recovery with Statistical Guarantees

James Cheshire, Yann Issartel

Abstract

Active seriation aims at recovering an unknown ordering of $n$ items by adaptively querying pairwise similarities. The observations are noisy measurements of entries of an underlying $n$ x $n$ permuted Robinson matrix, whose permutation encodes the latent ordering. The framework allows the algorithm to start with partial information on the latent ordering, including seriation from scratch as a special case. We propose an active seriation algorithm that provably recovers the latent ordering with high probability. Under a uniform separation condition on the similarity matrix, optimal performance guarantees are established, both in terms of the probability of error and the number of observations required for successful recovery.

Active Seriation: Efficient Ordering Recovery with Statistical Guarantees

Abstract

Active seriation aims at recovering an unknown ordering of items by adaptively querying pairwise similarities. The observations are noisy measurements of entries of an underlying x permuted Robinson matrix, whose permutation encodes the latent ordering. The framework allows the algorithm to start with partial information on the latent ordering, including seriation from scratch as a special case. We propose an active seriation algorithm that provably recovers the latent ordering with high probability. Under a uniform separation condition on the similarity matrix, optimal performance guarantees are established, both in terms of the probability of error and the number of observations required for successful recovery.
Paper Structure (91 sections, 17 theorems, 189 equations, 8 figures, 4 algorithms)

This paper contains 91 sections, 17 theorems, 189 equations, 8 figures, 4 algorithms.

Key Result

Theorem 4.1

There exists an absolute constant $c_0>0$ such that the following holds. Let $n \ge 3$ and $\tilde{n} \in [n]$, and assume that the input permutation $\mathop{\mathrm{\tilde{\pi}}}\nolimits$ of $[n-\tilde{n}]$ satisfies new-relative-order-incremental. Let $(\Delta,\sigma,T)$ be such that condition c If $M \in \mathcal{M}_\Delta$, then the error probability of algo:asii satisfies

Figures (8)

  • Figure 1: R-matrix & a permuted version.
  • Figure 2: Robinson matrices for scenarios (1)-(4), from left to right.
  • Figure 3: Empirical error probabilities for ASII and three benchmark methods as the parameter $\Delta$ varies. Scenarios (1-2-3-4) are displayed from left to right and top to bottom. Each experiment uses $n=10$ items and $T=10{,}000$ observations. For each value of $\Delta$, 100 Monte Carlo runs are split into $10$ equal groups; error bars show the $0.1$ and $0.9$ quantiles of the empirical error across groups.
  • Figure 4: Similarity matrix $M$ of a single-cell RNA-seq dataset before and after reordering by ASII. The recovered ordering reveals a clear block-diagonal structure consistent with developmental progression: dissimilar regions (blue) are pushed to the boundaries, while groups of highly similar cells (yellow and green) align along the diagonal.
  • Figure 5: Representation of the Robinson matrices $R^{(s)}$, $s \in \{1, 2, 3, 4\}$, corresponding to the four scenarios. Scenarios (1)–(3) have a minimal gap $\Delta$, while in scenario (4) the minimal gap is random but lower bounded by $\Delta$. The matrix $R^{(1)}$ is Toeplitz, while $R^{(2)}$, $R^{(3)}$, and $R^{(4)}$ are not. Here, $\Delta = 0.2$.
  • ...and 3 more figures

Theorems & Definitions (24)

  • Definition 2.1
  • Remark 1
  • Theorem 4.1: Upper bound with partial information
  • Theorem 4.2: Impossibility regime
  • Theorem 4.3: Recovery regime
  • Corollary 4.4: Sample complexity
  • Definition 4.5: $\Delta$-maximal subset
  • Theorem 4.6: Guarantees beyond uniform separation
  • Proposition C.1
  • Proposition C.2
  • ...and 14 more