The Role of Sequence Information in Minimal Models of Molecular Assembly
Jeremy Guntoro, Thomas Ouldridge
TL;DR
The paper investigates how sequence information and geometric constraints govern deterministic self-assembly in two aTAM-based models: backboned aTAM, which enforces neighbor growth along a fixed backbone, and sequenced aTAM, which uses a fixed tile sequence without adjacency constraints. It proves a finite universal assembly kit exists for the backboned model, underscoring the role of geometry in enabling efficient information use, while showing no universal kit exists for the sequenced model, revealing fundamental limitations of sequence-only strategies. By analyzing shape spaces and Kolmogorov complexity, the study links combinatorial growth (self-avoiding walks vs polyominoes) to assembly efficiency and demonstrates that backbone constraints significantly reduce the necessary tile diversity for large targets. Overall, the work suggests that physical geometric constraints are crucial for translating sequence programs into reliable, scalable molecular assembly, with implications for designing artificial folding systems.
Abstract
Sequence-directed assembly processes - such as protein folding - allow the assembly of a large number of structures with high accuracy from only a small handful of fundamental building blocks. We aim to explore how efficiently sequence information can be used to direct assembly by studying variants of the temperature-1 abstract tile assembly model (aTAM). We ask whether, for each variant, their exists a finite set of tile types that can deterministically assemble any shape producible by a given assembly model; we call such tile type sets "universal assembly kits". Our first model, which we call the "backboned aTAM", generates backbone-assisted assembly by forcing tiles to be added to lattice positions neighbouring the immediately preceding tile, using a predetermined sequence of tile types. We demonstrate the existence of universal assembly kit for the backboned aTAM, and show that the existence of this set is maintained even under stringent restrictions to the rules of assembly. We compare these results to a less constrained model that we call sequenced aTAM, which also uses a predetermined sequence of tiles, but does not constrain a tile to neighbour the immediately preceding tiles. We prove that this model has no universal assembly kit in the stringent case. The lack of such a kit is surprising, given that the number of tile sequences of length N scales faster than both the number and worst-case Kolmogorov complexity of producible shapes of size N for a sufficiently large - but finite - set of tiles. Our results demonstrate the importance of physical mechanisms, and specifically geometric constraints, in facilitating efficient use of the information in molecular programs for structure assembly.
