Unbiased and Error-Detecting Combinatorial Pooling Experiments with Balanced Constant-Weight Gray Codes for Consecutive Positives Detection
Guanchen He, Vasilisa A. Kovaleva, Carl Barton, Paul G. Thomas, Mikhail V. Pogorelyy, Hannah V. Meyer, Qin Huang
TL;DR
This work tackles the challenge of designing combinatorial pooling schemes that are both balanced across pools and capable of error detection while identifying two consecutive positive items. It introduces balanced constant-weight Gray codes for detecting consecutive positives (DCP-CWGCs), defining the constraints of fixed weight, distinct OR-sums for consecutive pairs, and a minimal Hamming distance between consecutive addresses. The authors develop two algorithms, a branch-and-bound algorithm (BBA) and a recursive combination approach (rcBBA), and implement them in the open-source package codePub to construct long, near-perfectly balanced DCP-CWGCs. Through theoretical analysis and simulations, they demonstrate maximal existence results for balanced codes, provide practical runtimes for large-scale designs, and show effective error detection with controlled candidate-item lists, highlighting the approach’s potential for scalable, bias-free pooling in biology and related fields.
Abstract
Combinatorial pooling schemes have enabled the measurement of thousands of experiments in a small number of reactions. This efficiency is achieved by distributing the items to be measured across multiple reaction units called pools. However, current methods for the design of pooling schemes do not adequately address the need for balanced item distribution across pools, a property particularly important for biological applications. Here, we introduce balanced constant-weight Gray codes for detecting consecutive positives (DCP-CWGCs) for the efficient construction of combinatorial pooling schemes. Balanced DCP-CWGCs ensure uniform item distribution across pools, allow for the identification of consecutive positive items such as overlapping biological sequences, and enable error detection by keeping the number of tests on individual and consecutive positive items constant. For the efficient construction of balanced DCP-CWGCs, we have released an open-source python package codePub, with implementations of the two core algorithms: a branch-and-bound algorithm (BBA) and a recursive combination with BBA (rcBBA). Simulations using codePub show that our algorithms can construct long, balanced DCP-CWGCs that allow for error detection in tractable runtime.
