Table of Contents
Fetching ...

Unbiased and Error-Detecting Combinatorial Pooling Experiments with Balanced Constant-Weight Gray Codes for Consecutive Positives Detection

Guanchen He, Vasilisa A. Kovaleva, Carl Barton, Paul G. Thomas, Mikhail V. Pogorelyy, Hannah V. Meyer, Qin Huang

TL;DR

This work tackles the challenge of designing combinatorial pooling schemes that are both balanced across pools and capable of error detection while identifying two consecutive positive items. It introduces balanced constant-weight Gray codes for detecting consecutive positives (DCP-CWGCs), defining the constraints of fixed weight, distinct OR-sums for consecutive pairs, and a minimal Hamming distance between consecutive addresses. The authors develop two algorithms, a branch-and-bound algorithm (BBA) and a recursive combination approach (rcBBA), and implement them in the open-source package codePub to construct long, near-perfectly balanced DCP-CWGCs. Through theoretical analysis and simulations, they demonstrate maximal existence results for balanced codes, provide practical runtimes for large-scale designs, and show effective error detection with controlled candidate-item lists, highlighting the approach’s potential for scalable, bias-free pooling in biology and related fields.

Abstract

Combinatorial pooling schemes have enabled the measurement of thousands of experiments in a small number of reactions. This efficiency is achieved by distributing the items to be measured across multiple reaction units called pools. However, current methods for the design of pooling schemes do not adequately address the need for balanced item distribution across pools, a property particularly important for biological applications. Here, we introduce balanced constant-weight Gray codes for detecting consecutive positives (DCP-CWGCs) for the efficient construction of combinatorial pooling schemes. Balanced DCP-CWGCs ensure uniform item distribution across pools, allow for the identification of consecutive positive items such as overlapping biological sequences, and enable error detection by keeping the number of tests on individual and consecutive positive items constant. For the efficient construction of balanced DCP-CWGCs, we have released an open-source python package codePub, with implementations of the two core algorithms: a branch-and-bound algorithm (BBA) and a recursive combination with BBA (rcBBA). Simulations using codePub show that our algorithms can construct long, balanced DCP-CWGCs that allow for error detection in tractable runtime.

Unbiased and Error-Detecting Combinatorial Pooling Experiments with Balanced Constant-Weight Gray Codes for Consecutive Positives Detection

TL;DR

This work tackles the challenge of designing combinatorial pooling schemes that are both balanced across pools and capable of error detection while identifying two consecutive positive items. It introduces balanced constant-weight Gray codes for detecting consecutive positives (DCP-CWGCs), defining the constraints of fixed weight, distinct OR-sums for consecutive pairs, and a minimal Hamming distance between consecutive addresses. The authors develop two algorithms, a branch-and-bound algorithm (BBA) and a recursive combination approach (rcBBA), and implement them in the open-source package codePub to construct long, near-perfectly balanced DCP-CWGCs. Through theoretical analysis and simulations, they demonstrate maximal existence results for balanced codes, provide practical runtimes for large-scale designs, and show effective error detection with controlled candidate-item lists, highlighting the approach’s potential for scalable, bias-free pooling in biology and related fields.

Abstract

Combinatorial pooling schemes have enabled the measurement of thousands of experiments in a small number of reactions. This efficiency is achieved by distributing the items to be measured across multiple reaction units called pools. However, current methods for the design of pooling schemes do not adequately address the need for balanced item distribution across pools, a property particularly important for biological applications. Here, we introduce balanced constant-weight Gray codes for detecting consecutive positives (DCP-CWGCs) for the efficient construction of combinatorial pooling schemes. Balanced DCP-CWGCs ensure uniform item distribution across pools, allow for the identification of consecutive positive items such as overlapping biological sequences, and enable error detection by keeping the number of tests on individual and consecutive positive items constant. For the efficient construction of balanced DCP-CWGCs, we have released an open-source python package codePub, with implementations of the two core algorithms: a branch-and-bound algorithm (BBA) and a recursive combination with BBA (rcBBA). Simulations using codePub show that our algorithms can construct long, balanced DCP-CWGCs that allow for error detection in tractable runtime.

Paper Structure

This paper contains 18 sections, 9 theorems, 15 equations, 11 figures, 4 algorithms.

Key Result

Proposition 1

The length $n$ of any code in $\text{DCP-CWGC}(m,r)$ is upper bounded by

Figures (11)

  • Figure 1: A combinatorial pooling example with $10$ items and $6$ pools. Items are mixed into pools according to an encoding based on a DCP-CWGC, represented as its incidence matrix $H$ and color-coded by the expected experimental outcome (grey-negative, yellow-positive). Knowing the DCP-CWGC, the experimental outcome can be decoded and the item of interest, as indicated by stars, identified.
  • Figure 2: Experimental assays used in combinatorial pooling experiments. In each example, a response can be observed upon successful binding of the item (blue circle) to a cell surface receptor. The item(s) that yield such a response are considered positive. Examples of responses and their read-outs are: 1. A primary cell line which secretes activation markers such as cytokines, quantifiable by biochemical assays, 2. a reporter cell line, which produces a fluorescently labeled protein quantifiable by e.g. flow cytometry or microscopy 3. gene expression changes in primary cells, quantifiable by RNA-sequencing.
  • Figure 3: Combinatorial Gray codes for the design of combinatorial pooling with consecutive positives detection. The uniform partition reduces the general $d$ consecutive positive case to the $2$ consecutive positive case. The yellow circles represent positive items and the blue circles represent negative items.
  • Figure 4: Properties of existing coding schemes. DCP-CWGCs bridge the gap between existing codes capable of consecutive positives detection (grey) and with constant weight of both codewords and consecutive OR-sums (blue).
  • Figure 5: An example of the balance-optimized path search in the BBA scheme. Rectangular and circular nodes represent addresses ($a$) and unions ($u$), respectively. Yellow nodes indicate the search path taken in which the rectangular ones correspond to a possible segment in an $(m=10,r=2,n \geq 4)$ DCP-CWGC.
  • ...and 6 more figures

Theorems & Definitions (12)

  • Definition 1
  • Proposition 1
  • Example 1
  • Proposition 2
  • Lemma 1
  • Corollary 1
  • Example 2
  • Lemma 2
  • Lemma 3
  • Proposition 3
  • ...and 2 more