Table of Contents
Fetching ...

PoolPy: Automated combinatorial pooling for high-throughput molecular profiling

Lorenzo Talamanca, Julian Trouillon

TL;DR

PoolPy is presented, a unified end-to-end framework and web platform to benchmark, automate and decode combinatorial group testing strategies tailored to application-specific constraints across assay modalities, enabling the scaling up of multi-readout functional assays.

Abstract

Combinatorial group testing reduces screening costs and turnaround time but remains challenging to apply due to design complexity, varying applicability, and lack of implementation tools. Here we present PoolPy, a unified end-to-end framework and web platform to benchmark, automate and decode combinatorial group testing strategies tailored to application-specific constraints across assay modalities. We demonstrate PoolPy utility for protein-ligand interaction screening and genome-wide molecular profiling, enabling the scaling up of multi-readout functional assays.

PoolPy: Automated combinatorial pooling for high-throughput molecular profiling

TL;DR

PoolPy is presented, a unified end-to-end framework and web platform to benchmark, automate and decode combinatorial group testing strategies tailored to application-specific constraints across assay modalities, enabling the scaling up of multi-readout functional assays.

Abstract

Combinatorial group testing reduces screening costs and turnaround time but remains challenging to apply due to design complexity, varying applicability, and lack of implementation tools. Here we present PoolPy, a unified end-to-end framework and web platform to benchmark, automate and decode combinatorial group testing strategies tailored to application-specific constraints across assay modalities. We demonstrate PoolPy utility for protein-ligand interaction screening and genome-wide molecular profiling, enabling the scaling up of multi-readout functional assays.

Paper Structure

This paper contains 21 sections, 37 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Overview of the PoolPy workflow and design performance. (a) The PoolPy workflow, following four major steps (left to right). (b-d) Comparison of PoolPy designs for number of tests (b), maximum group size (c) and number of steps (d) over 10 to 500 samples for cases with at most one positive sample. (e-g) Relative group size (e,f) and number of steps (g) required by each design related to the number of test per sample over 1 - 5 maximum number of positive samples (reflected by marker shape). Markers are either colored as in b (e,g), or by max. number of positives also indicated with confidence ellipses over one standard deviation (f). (h) Probability of error across prevalence values over 1 - 10 maximum number of positive samples for four total number of samples $S=25,50,75,100$ (top to bottom). (i) Heatmaps showing the best performing PoolPy designs for all combinations of 10 - 100 samples with at most 10% positive samples in terms of minimizing test number (top) or signal dilution (bottom). (j) Performance summary heatmap of PoolPy designs across four key performance indicators.
  • Figure 2: PoolPy enables optimal combinatorial pooling across applications. (a) Schematic representation of protein-ligand interaction screening and the corresponding PoolPy designs. (b-d) Ligand screening for human carbonic anhydrase II across five random sample draws (b), the draw which contained a positive (acetazolamide) sample (c) and the corresponding decoding scheme (d) for designs minimizing test number (top) or signal dilution (bottom). (e) DAP-seq pooling design used in f - j to profile ten E. coli TFs in four assays. (f) Transcription factor occupancy (fold enrichment over negative control) tracks over three example regions for single (top) or pooled (bottom) assays. For each region, all 14 plots are scaled to the same y-axis value. (g) Recovery of known binding sites between standard and pooled assays for the nine TFs with annotated binding sites on RegulonDB. (h) Identified DNA motifs in peak regions from single or pooled assays for the four TFs with significant motifs identified. (i) Similarity score matrices between single and pooled assays.
  • Figure S1: The binary design performance decreases sharply with increasing prevalence. (a) Schematic illustration of the binary design. Two examples are shown with each a different positive sample out of 15. For 15 samples, the binary design makes four pools of eight samples each. The result pattern of the four pools encodes the identity of the positive sample in binary numeral system. (b-e) Number of total tests (b), number of test per sample (c), maximum group size (d) or number of steps (e) needed using the binary design with 1 - 10 maximum numbers of positive samples across 10 to 100 samples.
  • Figure S2: Group testing performances and group sizes vary across prevalence values. (a-c) Relation between relative group size and test number (a), overall numbers of test (b), and maximum group sizes (c) for all 12 PoolPy designs with 1 - 10 maximum number of positive samples across varying numbers of samples. (a,b) The part where group testing becomes less efficient than individually testing each sample (above one test per sample) is grayed out.
  • Figure S3: Group testing methods require varying numbers of steps. Number of steps (rounds of experiment) needed using different group testing methods to identify up to 1 - 10 positive samples across varying numbers of samples. Only methods based on the Chinese Remainder Theorem or on the shifted transversal design can identify positive samples in a single step across prevalence values (non-adaptive designs).
  • ...and 4 more figures