Templated Assembly Theory: An Extension of the Canonical Assembly Index with Block-Compressed Template
Piotr Masierak
TL;DR
This work extends canonical string-based Assembly Theory by introducing templated assembly spaces that permit block-compressed templates with wildcards and template-instantiation steps. It defines the templated assembly index $TAI(w)$, proves $TAI(w)\le ASI(w)$, and demonstrates cases where templated templates strictly reduce the assembly length, capturing templated modularity not accessible to pure concatenation. The paper formalizes the model, connects it to grammars and pattern-based formalisms, and proposes a greedy macro-grammar heuristic for practical approximation, while establishing NP-membership for TAI-DEC and discussing the open question of its unconditional NP-hardness. It also outlines potential applications in sequence analysis, modularity detection, and biosignature design, highlighting the broader relevance of templated modularity to complex systems such as chemistry and genomics. Overall, the templated extension provides a new axis of structural analysis that complements ASI and offers novel algorithmic and practical avenues for studying hierarchical and pattern-based organization.
Abstract
Assembly Theory, as developed by Cronin and co-workers, assigns to an object an assembly index: the minimal number of binary join operations required to build at least one copy of the object from a specified set of basic building blocks, allowing reuse of intermediate components. For strings over a finite alphabet, the canonical assembly index can be defined in the free semigroup with universal binary concatenation and a "no-trash" condition, and its exact computation has been shown to be NP-complete. In this paper we propose an extension of the canonical, string-based formulation which augments pure concatenation with templated assembly steps. Intermediate objects may contain a distinguished wildcard symbol that represents a compressible block. Templates are restricted to block-compressed substrings of the target string and can be instantiated by inserting previously assembled motifs into one or many wildcard positions, possibly in parallel. This yields a new complexity measure, the templated assembly index, which strictly generalises the canonical index while preserving its operational character. We formalise the model, clarify its relation to the canonical assembly index and to classical problems such as the smallest grammar problem, and discuss the computational complexity of determining the templated assembly index. Finally, we sketch potential applications in sequence analysis, modularity detection, and biosignature design.
