Table of Contents
Fetching ...

Sparse Graph Reconstruction and Seriation for Large-Scale Image Stacks

Fuming Yang, Yaron Meirovitch, Jeff W. Lichtman

TL;DR

This work tackles the seriation problem for large-scale image stacks by recovering a linear order from noisy, locally sampled pairwise similarities under a near-linear query budget. It introduces a five-stage pipeline that first builds a sparse graph containing all essential edges via Random-Hook Borůvka, diameter-reducing condensation, double-sweep BFS, and fixed-window densification, followed by Iterative Similarity Search and the SuperChain assembly to recover the permutation. The approach achieves near-linear query complexity $O\bigl(N(\log N + K)\bigr)$ and exact recovery under simple margin/noise assumptions, with empirical robustness up to about $2N/3$ correct second-best edges and strong performance on wafer-scale EM datasets—outperforming spectral, MST, and TSP baselines with substantial speedups. The method enables practical, overnight processing of thousands of sections on commodity hardware and is applicable to parallel seriation tasks beyond EM, such as temporal ordering and archaeological seriation.

Abstract

We study recovering a 1D order from a noisy, locally sampled pairwise comparison matrix under a tight query budget. We recast the task as reconstructing a sparse, noisy line graph and present, to our knowledge, the first method that provably builds a sparse graph containing all edges needed for exact seriation using only O(N(log N + K)) oracle queries, which is near-linear in N for fixed window K. The approach is parallelizable and supports both binary and bounded-noise distance oracles. Our five-stage pipeline consists of: (i) a random-hook Boruvka step to connect components via short-range edges in O(N log N) queries; (ii) iterative condensation to bound graph diameter; (iii) a double-sweep BFS to obtain a provisional global order; (iv) fixed-window densification around that order; and (v) a greedy SuperChain that assembles the final permutation. Under a simple top-1 margin and bounded relative noise we prove exact recovery; empirically, SuperChain still succeeds when only about 2N/3 of true adjacencies are present. On wafer-scale serial-section EM, our method outperforms spectral, MST, and TSP baselines with far fewer comparisons, and is applicable to other locally structured sequencing tasks such as temporal snapshot ordering, archaeological seriation, and playlist/tour construction.

Sparse Graph Reconstruction and Seriation for Large-Scale Image Stacks

TL;DR

This work tackles the seriation problem for large-scale image stacks by recovering a linear order from noisy, locally sampled pairwise similarities under a near-linear query budget. It introduces a five-stage pipeline that first builds a sparse graph containing all essential edges via Random-Hook Borůvka, diameter-reducing condensation, double-sweep BFS, and fixed-window densification, followed by Iterative Similarity Search and the SuperChain assembly to recover the permutation. The approach achieves near-linear query complexity and exact recovery under simple margin/noise assumptions, with empirical robustness up to about correct second-best edges and strong performance on wafer-scale EM datasets—outperforming spectral, MST, and TSP baselines with substantial speedups. The method enables practical, overnight processing of thousands of sections on commodity hardware and is applicable to parallel seriation tasks beyond EM, such as temporal ordering and archaeological seriation.

Abstract

We study recovering a 1D order from a noisy, locally sampled pairwise comparison matrix under a tight query budget. We recast the task as reconstructing a sparse, noisy line graph and present, to our knowledge, the first method that provably builds a sparse graph containing all edges needed for exact seriation using only O(N(log N + K)) oracle queries, which is near-linear in N for fixed window K. The approach is parallelizable and supports both binary and bounded-noise distance oracles. Our five-stage pipeline consists of: (i) a random-hook Boruvka step to connect components via short-range edges in O(N log N) queries; (ii) iterative condensation to bound graph diameter; (iii) a double-sweep BFS to obtain a provisional global order; (iv) fixed-window densification around that order; and (v) a greedy SuperChain that assembles the final permutation. Under a simple top-1 margin and bounded relative noise we prove exact recovery; empirically, SuperChain still succeeds when only about 2N/3 of true adjacencies are present. On wafer-scale serial-section EM, our method outperforms spectral, MST, and TSP baselines with far fewer comparisons, and is applicable to other locally structured sequencing tasks such as temporal snapshot ordering, archaeological seriation, and playlist/tour construction.

Paper Structure

This paper contains 26 sections, 9 theorems, 15 equations, 10 figures, 4 tables.

Key Result

Lemma 1

Let $G$ be any connected spanning subgraph of $G^*$. A condensation round augments $G$ by adding all oracle-positive edges incident to a small set of farthest vertices (chosen by a double-sweep on $G$). The resulting graph $G'$ satisfies

Figures (10)

  • Figure 1: WaferTools pipeline for wafer-to-volume EM sorting. (a) Starting from unordered section collection method (e.g., MagC Templier2019MagC), our WaferTools performs automatic section detection, (b–d) ROI unification with Procrustes alignment, and metadata generation for EM targeting. (e–g) The acquired EM stacks are then ordered via hybrid pipeline or near-linear Graph Condensation–Densification with SuperChain (hundreds to thousands sections). (f) Visualizing the final order result. Shown here is a wafer from the unpublished HI--MC dataset.
  • Figure 2: Feature-matching pipeline for similarity computation. (a) SIFT keypoints are extracted and matched between candidate section pairs using FLANN. (b) RANSAC geometric verification filters matches to retain only those consistent with an affine transformation, rejecting outliers. (c) For pairs exceeding the inlier threshold, sections are aligned and SSIM is computed on the overlap region to capture fine morphological similarity.
  • Figure 3: Four-phase sparse graph construction. (a) Ground-truth linear structure to be recovered. (b) Phase 1: Random-hook Borůvka builds initial spanning tree using $O(N\log N)$ queries. (c-e) Phase 2: Iterative condensation adds long-range edges to reduce graph diameter. (f) Phase 3: Double-sweep BFS produces provisional ordering; Phase 4: $K$-window densification recovers missing local edges. (g) Final sparse graph containing all essential edges for exact seriation.
  • Figure 4: SuperChain assembly process. (a) Initial ISS graph with each vertex retaining its two strongest edges. (b-c) Stage 1: Reciprocal best matches form mini-chains. (d) Stage 2: Chains extend via endpoint connections. (e) Chain rules ensure correct assembly: maximize-chain prevents wasteful merges, unbroken-chain maintains linearity. (f) Fallback rule allows depth-1 connections when needed. (g) Final single representing the complete section order.
  • Figure 5: Similarity Computation Time and Speed-up Factor Comparison. We compare the runtime (bar plot, left y-axis) and speed-up factor (dashed line, right y-axis) of our proposed Graph Condensation approach versus the traditional fully-pairwise method across 100, 500, and 1000 EM sections.
  • ...and 5 more figures

Theorems & Definitions (11)

  • Lemma 1: Diameter Recurrence
  • Theorem 1: Bounded Diameter After Condensation
  • Theorem 2: Query Complexity — Binary Oracle
  • Lemma 2: Edge Validity Under Noise
  • Lemma 3: Position Error After Condensation
  • Theorem 3: Window Sufficiency
  • Theorem 4: Main Result — Weighted Oracle
  • Lemma 4: Endpoint local maximality
  • proof
  • Theorem 5: Exact recovery under SuperChain
  • ...and 1 more