An Algorithm to Recover Shredded Random Matrices
Caelan Atamanchuk, Luc Devroye, Massimo Vicenzo
TL;DR
The paper addresses reconstructing an $n\times n$ binary matrix from unordered multisets of its rows and columns under a random Bernoulli$(p)$ model. It introduces a two-part, trie-based algorithm: Part One uses a Hamming-weight partition of columns and sub-weight signatures to uniquely identify the row permutation (and hence the column order via a column trie) when possible; Part Two enumerates and validates residual row-permutations consistent with signature multiplicities, outputting permutation groups for columns when duplicates exist. The authors prove that for sufficiently large $p$, the algorithm runs in $O(n^2)$ time with high probability and in expectation, and they establish reconstructibility thresholds showing that a random matrix is reconstructible w.h.p. above $p \approx \frac{2\log n}{n}$, with stronger guarantees in denser regimes. These results connect reconstruction of shredded matrices to broader themes in graph reconstruction, canonization, and shotgun assembly, and they provide a concrete, efficient method for recovering original orderings in random settings.
Abstract
Given some binary matrix $M$, suppose we are presented with the collection of its rows and columns in independent arbitrary orderings. From this information, are we able to recover the unique original orderings and matrix? We present an algorithm that identifies whether there is a unique ordering associated with a set of rows and columns, and outputs either the unique correct orderings for the rows and columns or the full collection of all valid orderings and valid matrices. We show that there is a constant $c > 0$ such that the algorithm terminates in $O(n^2)$ time with high probability and in expectation for random $n \times n$ binary matrices with i.i.d.\ Bernoulli $(p)$ entries $(m_{ij})_{ij=1}^n$ such that $\frac{c\log^2(n)}{n(\log\log(n))^2} \leq p \leq \frac{1}{2}$.
