Table of Contents
Fetching ...

Statistical methods for reference-free single-molecule localisation microscopy

Jack Peyton, Benjamin Davis, Emily Gribbin, Daniel Rolfe, Hannah Mitchell

Abstract

MINFLUX (Minimal Photon Flux) is a single-molecule imaging technique capable of resolving fluorophores at a precision of <5 nm. Interpretation of the point patterns generated by this technique presents challenges due to variable emitter density, incomplete bio-labelling of target molecules and their detection, error prone measurement processes, and the presence of spurious (non-structure associated) fluorescent detections. Together, these challenges ensure structural inferences from single-molecule imaging datasets are non-trivial in the absence of strong a priori information, for all but the smallest of point patterns. In addition, current methods often require subjective parameter tuning and presuppose known structural templates, limiting reference-free discovery. We present a statistically grounded, end-to-end analysis framework. Focusing on MINFLUX derived datasets and leveraging Bayesian and spatial statistical methods, a pipeline is presented that demonstrates 1) uncertainty aware clustering of measurements into emitter groups that performs better than current gold standards, 2) rapid identification of molecular structure supergroups, and 3) reconstruction of repeating structures within the dataset without substantial prior knowledge. This pipeline is demonstrated using simulated and real MINFLUX datasets, where emitter clustering and centre detection maintain high performance (emitter subset assignment accuracy > 0.75) across all conditions evaluated, while structural inference achieves reliable discrimination (F1 approx. 0.9) at high labelling efficiency. Template-free reconstruction of Nup96 and DNA-Origami 3x3 grids are achieved.

Statistical methods for reference-free single-molecule localisation microscopy

Abstract

MINFLUX (Minimal Photon Flux) is a single-molecule imaging technique capable of resolving fluorophores at a precision of <5 nm. Interpretation of the point patterns generated by this technique presents challenges due to variable emitter density, incomplete bio-labelling of target molecules and their detection, error prone measurement processes, and the presence of spurious (non-structure associated) fluorescent detections. Together, these challenges ensure structural inferences from single-molecule imaging datasets are non-trivial in the absence of strong a priori information, for all but the smallest of point patterns. In addition, current methods often require subjective parameter tuning and presuppose known structural templates, limiting reference-free discovery. We present a statistically grounded, end-to-end analysis framework. Focusing on MINFLUX derived datasets and leveraging Bayesian and spatial statistical methods, a pipeline is presented that demonstrates 1) uncertainty aware clustering of measurements into emitter groups that performs better than current gold standards, 2) rapid identification of molecular structure supergroups, and 3) reconstruction of repeating structures within the dataset without substantial prior knowledge. This pipeline is demonstrated using simulated and real MINFLUX datasets, where emitter clustering and centre detection maintain high performance (emitter subset assignment accuracy > 0.75) across all conditions evaluated, while structural inference achieves reliable discrimination (F1 approx. 0.9) at high labelling efficiency. Template-free reconstruction of Nup96 and DNA-Origami 3x3 grids are achieved.
Paper Structure (17 sections, 6 figures)

This paper contains 17 sections, 6 figures.

Figures (6)

  • Figure 1: (a) End-to-end input/output overview for each stage of the framework. (b) Raw localisations are clustered into (c) emitters using GROUPA. (d) Voidwalker distinguishes statistically significant empty space (dashed red circles) that inform priors and proposal space. (e) RJMCMC sampler assigns emitters to structural centres, yielding per-emitter probability distributions over centre assignments. (f) Assignment distributions define marks in a marked point process to identify structure (blue) and super-structure (green). (g) Cliques are sampled from co-assigned emitter populations of both structure (blue) and super-structure (green). ASMBLR reconstructs molecular (h) structure and (i) super-structure from the inner space of sampled cliques.
  • Figure 2: Clustering performance versus localisation uncertainty $\sigma$ for GROUPA, DBSCAN, and HDBSCAN, evaluated on synthetic Nup96 data with 50 replicates per $\sigma$. Shaded bands: 2.5-97.5th percentiles across replicates. (a) ARI, (b) FMI, (c) NMI. DBSCAN/HDBSCAN were tuned per $\sigma$; GROUPA required no parameter tuning.
  • Figure 3: Voidwalker-Gibbs achieves centre detection and emitter assignment across data quality regimes. Performance on synthetic Nup96 structures across labelling efficiencies of 0.3, 0.6, 0.9, 1.0 and clutter levels of $0-30\%$. Curves show median performance over 100 replicates with 2.5-97.5th percentile bands. (a) Centre-level F1 score versus clutter. (b) Emitter-centre assignment accuracy versus clutter. (c) Joint distribution of F1 and assignment accuracy across all datasets. (d) Relative radius bias versus clutter, with ground truth emitter uncertainty. (e) Probability of sampling a structurally representative clique versus clique size, comparing radial uniform and Voidwalker-guided sampling. Uniform sampling assumes 8 true emitters with 2-6 spurious neighbours; Voidwalker-guided sampling assumes 90% per-emitter assignment accuracy. (f) No. emitters assigned per centre across labelling efficiencies.
  • Figure 4: Mark-based super-structure detection establishes data-quality thresholds distinct from centre detection. (a) Performance across labelling efficiencies (0.3, 0.6, 0.9, 1.0) and clutter levels ($0-30\%)$ on synthetic Nup96 dimer mixtures. Performance surfaces for F1 (left), precision (centre), and recall (right) over 100 replicates per condition. Numerical values indicate mean metrics. Super-structure discovery algorithm applied to labelling efficiencies of (b) 1.0 and (c) 0.6.
  • Figure 5: ASMBLR reconstructs repeating molecular motifs across a range of data conditions. Observed measurements and model reconstruction with 67% credible interval for labelling efficiencies of (a-b) 1.0, (c-d) 0.9, (e-f) 0.6, and (g-h) 0.3. 600 cliques of size 3 were used for the 8-fold model. Reconstruction of DNA-Origami 3x3 grids (i-j) also shown with 500 cliques of size 5.
  • ...and 1 more figures