Pattern recognition in the nucleation kinetics of non-equilibrium self-assembly

Constantine Glen Evans; Jackson O'Brien; Erik Winfree; Arvind Murugan

Pattern recognition in the nucleation kinetics of non-equilibrium self-assembly

Constantine Glen Evans, Jackson O'Brien, Erik Winfree, Arvind Murugan

TL;DR

Examination of nucleation during self-assembly of multicomponent structures illustrates how ubiquitous molecular phenomena inherently classify high-dimensional patterns of concentrations in a manner similar to neural network computation.

Abstract

Inspired by biology's most sophisticated computer, the brain, neural networks constitute a profound reformulation of computational principles. Remarkably, analogous high-dimensional, highly-interconnected computational architectures also arise within information-processing molecular systems inside living cells, such as signal transduction cascades and genetic regulatory networks. Might neuromorphic collective modes be found more broadly in other physical and chemical processes, even those that ostensibly play non-information-processing roles such as protein synthesis, metabolism, or structural self-assembly? Here we examine nucleation during self-assembly of multicomponent structures, showing that high-dimensional patterns of concentrations can be discriminated and classified in a manner similar to neural network computation. Specifically, we design a set of 917 DNA tiles that can self-assemble in three alternative ways such that competitive nucleation depends sensitively on the extent of co-localization of high-concentration tiles within the three structures. The system was trained in-silico to classify a set of 18 grayscale 30 x 30 pixel images into three categories. Experimentally, fluorescence and atomic force microscopy monitoring during and after a 150-hour anneal established that all trained images were correctly classified, while a test set of image variations probed the robustness of the results. While slow compared to prior biochemical neural networks, our approach is surprisingly compact, robust, and scalable. This success suggests that ubiquitous physical phenomena, such as nucleation, may hold powerful information processing capabilities when scaled up as high-dimensional multicomponent systems.

Pattern recognition in the nucleation kinetics of non-equilibrium self-assembly

TL;DR

Abstract

Paper Structure (11 sections, 16 figures)

This paper contains 11 sections, 16 figures.

Figures (16)

Figure 1: Conceptual framework for pattern recognition by nucleation. When one set of molecules can potentially assemble multiple distinct structures, the nucleation process that selects between outcomes is responsive to high-dimensional concentration patterns. Assembly pathways can be depicted on an energy landscape (schematic shown) as paths from a basin for unassembled components that proceed through critical nucleation seeds (barriers) to a basin for each possible final structure. Seeds that colocalize high concentration components will lower the nucleation barrier for corresponding assembly pathways. The resulting selectivity of nucleation in high-dimensional self-assembly is sufficiently expressive to perform complex pattern recognition in a manner analogous to neural computation (see Extended Data Fig. \ref{['extfig:neural']}).
Figure 2: A multifarious mixture of 917 molecular species that can assemble into three distinct structures from one set of molecules.a, 42-nucleotide DNA strands self-assemble into two-dimensional (2D) structures by forming bonds with four complementary strands in solution via four 10 or 11 nucleotide domains. The strands can be abstracted as square tiles, each named and shown with distinct binding domains identified by number, such that e.g. $708$ is complementary to $708^*$. At nucleation and growth temperatures, attaching by two bonds or more is favorable, while one is insufficient. b, One pool of 917 tile types assembles into three distinct shapes, H, A and M, through a multitude of pathways. While each tile occurs at most once in each shape, the shared purple species recur in multiple shapes, in distinct spatial arrangements; e.g., S149 is highlighted in red. c, Annealing an equal mix of all tiles results in a mixture of fully and partially assembled H, A and M, imaged by atomic force microscope (AFM). This is the same sample as "SHAM60" in Fig. \ref{['fig:patternexp']}. Inset illustrates the expected slant of the shapes due to single-stranded tile geometry. d, A typical experiment mixes some concentration of each tile type into a single tube, with some tiles swapped for fluorophore- and quencher-modified versions. The sample is heated to remove any preexisting binding, cooled to a temperature slightly above where any growth is observed, then slowly annealed through a small range of temperatures while fluorescence is measured in a qPCR machine; samples are then imaged by AFM.
Figure 3: Theory shows selective nucleation when high concentration tiles are co-localized in one shape more than in others.a, One pattern ("A flag 9") enhancing the concentration of shared tiles colocalized in A but relatively dispersed in H and M. b, A flag 9 plotted by tile locations in each shape, along with example "flag" patterns that have colocalization in H and M. c, For A flag 9, free energies of assemblies along predicted nucleation pathways for each shape (Extended Data Fig. \ref{['extfig:nucleation']}). Several example assemblies are shown; the green and red ones are critical seeds for the A and H pathways respectively. d, Regions predicted to participate in nucleation by the simulation for three concentration patterns (lighter colors correspond to higher participation). e, Macrostate free energies for sets of partial assemblies of increasing size (number of tiles) and predicted AFM results at several temperatures spanning the melting temperature. Small plots show the full size range, thus illustrating the independence of the nucleation barrier kinetics and the complete assembly thermodynamics. f, For on-target (A, green) and off-target (H, red) shapes, nucleation rates (dashed) and growth rates (solid) are plotted as a function of temperature, according to the simplified model of Extended Data Fig. \ref{['extfig:nucleation']}. Rates are given relative to the time to completely consume the lowest-concentration tile; the horizontal dotted line indicates the rate of annealing between the on-target to off-target nucleation temperatures. Due to the higher nucleation temperature for the on-target shape, when annealing time scales are comparable to or slower than growth time scales, depletion of shared tiles during a temperature anneal can lead to a winner-take-all (WTA) effect. Slower annealing and faster growth can increase the WTA effect. g, In this model, WTA leads to higher selectivity (on-target vs. total nucleation) compared to systems with no shared components; for slower anneals, selectivity increases for systems with shared components, but decreases for systems with no shared components.
Figure 4: Selective nucleation in experiments with shape-specific localized concentration patterns of shared tiles.a, Pairs of alternative tiles with a fluorophore and quencher (Fig. \ref{['fig:multifarious']}) have their fluorescence quenched when incorporated together in an assembly; small assemblies of just a few strands do not effectively quench (see Extended Data Fig. \ref{['extfig:fluorophores']}). b, Samples were annealed with a temperature protocol that cooled from 71 $^\circ\text{C}$ (well above melting temperature) to 48 $^\circ\text{C}$ over $\sim 6$ hours, cooled to 46 $^\circ\text{C}$ over 100 hours, and finally cooled to 39.5 $^\circ\text{C}$ over 3 hours (see Extended Data Fig. \ref{['extfig:flags']}). c, Experimental results for the 3 flag patterns shown in Fig. \ref{['fig:nucleation']}. The positions of fluorophore/quencher tile pairs used in each of the four samples are shown by the inset icons. Points where fluorescence signals dropped by 10% below their maximum (to which signals were normalized) are shown with colored dots for on-target nucleation and with $\otimes$ for off-target nucleation. 'Growth times' measure the period from '10% quenching' to the end of the experiment, shown as horizontal bars. Sample AFM images from one of the samples are shown for each flag. d, Total growth times for on-target versus off-target nucleation are summarized for all 37 flag patterns. Each numbered box indicates the location of the corresponding $5 \times 5$ checkerboard flag; good performance is indicated by a tall green bar and a short red bar. e, The same data displayed as a ternary plot, with proximity to triangle corners indicating relative fractions of growth time and circle size indicating overall growth time. f, Average change in quenching (a measure of nucleation) of on- and off-target structures with flag patterns compared to equimolar SHAM mixes. Each dot represents a single flag pattern (see Extended Data Fig. \ref{['extfig:wta']}). For most patterns, increasing shared tile concentrations reduces the absolute off-target nucleation, supporting a winner-take-all effect.
Figure 5: Design of self-assembly phase diagrams to solve pattern recognition problems.a, Phase diagram shows desired outcomes of kinetically controlled self-assembly in different regions of $N=917$ dimensional concentration space (2D schematic shown). Each grayscale image represents a vector of tile concentrations. b,$\theta$ specifies which pixel location corresponds to which tile. c, Given a map $\theta$, any image can be converted to a tile concentration vector by associating the grayscale value of pixel location $n$ with the concentration of the corresponding tile $i=\theta(n)$. We compute the 'loss' for a given pixel-to-tile map $\theta$ using simulations to estimate the nucleation rates of desired and undesired structures for each image and summing over a training set. Stochastic optimization in $\theta$ space gives a putative optimal $\theta_{opt}$ that we used for experiments. d, Images used for training. e, Additional images used to test generalization power.
...and 11 more figures