Sparse Graph Reconstruction and Seriation for Large-Scale Image Stacks
Fuming Yang, Yaron Meirovitch, Jeff W. Lichtman
TL;DR
This work tackles the seriation problem for large-scale image stacks by recovering a linear order from noisy, locally sampled pairwise similarities under a near-linear query budget. It introduces a five-stage pipeline that first builds a sparse graph containing all essential edges via Random-Hook Borůvka, diameter-reducing condensation, double-sweep BFS, and fixed-window densification, followed by Iterative Similarity Search and the SuperChain assembly to recover the permutation. The approach achieves near-linear query complexity $O\bigl(N(\log N + K)\bigr)$ and exact recovery under simple margin/noise assumptions, with empirical robustness up to about $2N/3$ correct second-best edges and strong performance on wafer-scale EM datasets—outperforming spectral, MST, and TSP baselines with substantial speedups. The method enables practical, overnight processing of thousands of sections on commodity hardware and is applicable to parallel seriation tasks beyond EM, such as temporal ordering and archaeological seriation.
Abstract
We study recovering a 1D order from a noisy, locally sampled pairwise comparison matrix under a tight query budget. We recast the task as reconstructing a sparse, noisy line graph and present, to our knowledge, the first method that provably builds a sparse graph containing all edges needed for exact seriation using only O(N(log N + K)) oracle queries, which is near-linear in N for fixed window K. The approach is parallelizable and supports both binary and bounded-noise distance oracles. Our five-stage pipeline consists of: (i) a random-hook Boruvka step to connect components via short-range edges in O(N log N) queries; (ii) iterative condensation to bound graph diameter; (iii) a double-sweep BFS to obtain a provisional global order; (iv) fixed-window densification around that order; and (v) a greedy SuperChain that assembles the final permutation. Under a simple top-1 margin and bounded relative noise we prove exact recovery; empirically, SuperChain still succeeds when only about 2N/3 of true adjacencies are present. On wafer-scale serial-section EM, our method outperforms spectral, MST, and TSP baselines with far fewer comparisons, and is applicable to other locally structured sequencing tasks such as temporal snapshot ordering, archaeological seriation, and playlist/tour construction.
