Table of Contents
Fetching ...

Data Sieving for Scalable Real-Time Multichannel Nanopore Sensing

Matteo Cartiglia, Natan Biesmans, Wannes Peeters, Wouter Botermans, Koen Ongena, Liam Vandekerckhove, Wouter Renckens, Eric Beamish, Elizabeth Skelly, Kirill A. Afonin, Pol van Dorpe, Sanjin Marion

Abstract

High-throughput solid-state nanopore experiments generate continuous MHz-rate data streams in which only a small fraction of data contains informative molecular information. This creates storage and processing bottlenecks that limit experimental scalability. We introduce Data Sieving, a GPU-accelerated acquisition framework that integrates real-time event detection directly into the measurement pipeline and selectively stores and allows real-time analysis of snapshots around molecular translocations. The system employs a lightweight rolling-average and min-max trigger to identify event candidates in parallel across channels. This architecture reduces stored data volume by up to 98% while preserving complete molecular signatures across a wide temporal range, from microsecond-scale protein dynamics to second-scale nucleic acid nanoparticle events. Continuous baseline monitoring enables autonomous closed-loop actuation; in high-concentration DNA experiments, automatic declogging restored pore conductance, reducing the time spent in a non-productive clogged state to near-zero and without interrupting parallel measurements. Validated across DNA, protein, and nucleic acid nanoparticle measurements, Data Sieving links data storage directly to molecular information content rather than experiment duration, enabling scalable, real-time operation of parallel nanopore sensors. The approach provides a hardware-agnostic foundation for long-duration, high-bandwidth single-molecule experiments and other event-driven sensing platforms. By using algorithms intrinsically compatible with low-latency digital architectures, this framework provides a clear path toward high-bandwidth, highly multiplexed recording across hundreds of individual nanopore channels in both solid-state and biological pores.

Data Sieving for Scalable Real-Time Multichannel Nanopore Sensing

Abstract

High-throughput solid-state nanopore experiments generate continuous MHz-rate data streams in which only a small fraction of data contains informative molecular information. This creates storage and processing bottlenecks that limit experimental scalability. We introduce Data Sieving, a GPU-accelerated acquisition framework that integrates real-time event detection directly into the measurement pipeline and selectively stores and allows real-time analysis of snapshots around molecular translocations. The system employs a lightweight rolling-average and min-max trigger to identify event candidates in parallel across channels. This architecture reduces stored data volume by up to 98% while preserving complete molecular signatures across a wide temporal range, from microsecond-scale protein dynamics to second-scale nucleic acid nanoparticle events. Continuous baseline monitoring enables autonomous closed-loop actuation; in high-concentration DNA experiments, automatic declogging restored pore conductance, reducing the time spent in a non-productive clogged state to near-zero and without interrupting parallel measurements. Validated across DNA, protein, and nucleic acid nanoparticle measurements, Data Sieving links data storage directly to molecular information content rather than experiment duration, enabling scalable, real-time operation of parallel nanopore sensors. The approach provides a hardware-agnostic foundation for long-duration, high-bandwidth single-molecule experiments and other event-driven sensing platforms. By using algorithms intrinsically compatible with low-latency digital architectures, this framework provides a clear path toward high-bandwidth, highly multiplexed recording across hundreds of individual nanopore channels in both solid-state and biological pores.

Paper Structure

This paper contains 14 sections, 5 figures.

Figures (5)

  • Figure 1: Data Sieving architecture and algorithmic pipeline.(a) System Overview: High-frequency (HF) data from nanopores is recorded via multichannel amplifiers (1, 4, 16, or 24 channels). Data is processed using a heterogeneous CPU/GPU strategy to manage real-time throughput. The end result of the pipeline is the extraction of event candidates for further processing of the parallel high-bandwidth data streams. These event candidates are then further "pruned" to remove any unnecessary padding around them, with the intent of reducing data storage and real-time processing requirements later in the pipeline. (b) Event Candidate Detection: Parallelized detection uses a rolling-average (RA) filter for denoising and windowed min-max (MM) triggering. This stage simultaneously enables active-feedback control and coarse signal classification. Thresholds are automatically initialized from baseline noise statistics, though they can be manually overridden. (c) RA+MM Logic: The detector acts as a tunable band-pass filter where the RA suppresses high-frequency noise and the MM rejects slow baseline drift by evaluating local peak-to-peak amplitude. (d) Event Pruning: Raw candidate snapshots undergo digital low-pass filtering and voltage-stability checks. Precise event boundaries are localized via parallel metric extraction and cumulative-sum algorithms. If both methods reach a consensus, the event is trimmed to the agreed interval; otherwise, the full untrimmed snapshot is retained to prevent information loss.
  • Figure 2: Experimental validation and data reduction efficiency. All measurements at 400 mV in 4 M LiCl (17 S/m), four channels at 27 MHz, pore diameters 9--12.1 nm. (a) Density plots of mean current drop ($\Delta I$) versus dwell time for translocation events corresponding with the passage of 250 bp (red), 2.5k bp (green), and 10k bp (blue) dsDNA. Integrated charge values and representative traces confirm high-fidelity capture across three orders of magnitude in dwell time. (b) Comparison of storage growth: traditional logging (solid line) versus Data Sieving (purple markers) for 250 bp, 2.5 kbp, and 10 kbp dsDNA at three concentrations, run until $\sim$1000 events per channel were detected across four channels. Data Sieving links file size to molecular information content rather than experiment duration. The inset shows a linear relationship between file size and the number of detected events, regardless of experimental duration. (c) Relative data volume reduction for each molecular species. Massive reduction from traditional recording to event candidate storage ($\sim$95%) and final pruning ($>$98%). (d) Reduction efficiency as a function of capture rate. Data Sieving maintains $>$80% reduction at high event frequencies.
  • Figure 3: Real-time pore health monitoring and automatic declogging. Experiments with 10 kbp dsDNA at 1.8 nM, 400 mV in 4 M LiCl. (a) Power Spectral Density (PSD) comparing a normal translocation event to a clogging event. Clogging events are characterized by distinct low-frequency noise, allowing for real-time discrimination. (b) Schematic of the automatic declogging cycle: detection of a clog triggers a brief polarity inversion (+600 mV, 1 s) to restore open-pore conductance. (c) Experimental data without active feedback (10 kbp dsDNA, 1.8 nM): Pores are quickly rendered unusable due to frequent clogging, resulting in a 52.8% clogged state and a mean time to first clog of 59.6 s. (d) Experimental data with automatic declogging enabled (10 kbp dsDNA, 1.8 nM): Data Sieving identifies baseline deviations in real time and issues brief polarity-inversion pulses (+600 mV, 1 s) to restore open-pore conductance. This produces repeated, rapid de-clog events that extend experimental time and data.
  • Figure 4: GPU resource utilization and computational scalability. Four channels at 27 MHz, 400 mV in 4 M LiCl. (a) Hardware benchmarking: Comparative GPU utilization and effective TFLOPS for the parallel acquisition of 4 channels (2.5 kbp DNA at 10 nM) using high-end (RTX 4080 Super), mid-range (GTX 1070), and entry-level (T1000) GPUs. While utilization increases on lower-tier hardware, all models maintain stable real-time acquisition. (b) Mean GPU utilization as a function of aggregate incoming open pore data rate. When event-candidate detection is performed directly on the GPU , utilization remains below 40% for data rates exceeding 100 MHz, indicating computational headroom. (c) Radar plot showing incremental GPU utilization, VRAM usage, and memory controller load relative to a control solution without analytes, evaluated across three DNA lengths (250 bp, 2.5 kbp, 10 kbp) and three concentration tiers. Only the memory controller load increases significantly with capture rate, reflecting higher load from raw snapshot transfers, while GPU and VRAM usage remain modest.
  • Figure 5: High temporal dynamic range biomolecular sensing.(a) Representative current traces spanning the full dynamic range of the system. Top: 120 nM streptavidin under native (2 M KCl, 10 mM Tris) and denaturing (3 M GdmCl, 1 M KCl, 10 mM Tris) conditions, illustrating fast transient translocations in the 10 µ s regime. Bottom: nucleic acid nanoparticle (NANP) translocations at 1 V, showing sustained events lasting up to $\sim$100 ms. Scale bars indicate time and amplitude. (b) Scatter plot of fractional blockade ($\Delta I/I_0$) versus dwell time for streptavidin. Starred markers indicate the events shown as traces in (a). The system captures conformational differences between native and unfolded protein states. (c, d) Mixed nucleic acid nanostructures (1 nM DNA cubes and 0.2 nM RNA rings) at 600 mV (c) and 1 V (d) in 0.4 M KCl, 2 mM MgCl$_2$, 10 mM Tris; pore diameter 10.4 nm, 26.7 MHz sampling. Gaussian Mixture Model (GMM) clustering reveals three distinct populations (fragments, RNA rings, and DNA cubes), demonstrating Data Sieving's ability to maintain high-resolution capture over five orders of magnitude in dwell time.