Table of Contents
Fetching ...

Structural and Compositional Complexities of Hierarchical Self-Assembly: a Hypergraph Approach

Alexei V. Tkachenko

TL;DR

This paper tackles how to quantify the information content of programmable self-assembled structures that span molecules to crystals. It introduces Blocks & Bonds (B&B) hypergraphs and the Structure Code (SC) to encode hierarchical assemblies, and defines three related complexity measures: $C_{B\&B}$, $C_{comp}$, and $C_{struct}$. Empirical analysis across ethylene glycol, glucose, DNA origami lattices, and crystalline designs shows a strong correlation among these measures, with $C_{comp}$ emerging as a practical, encoding-free proxy. The framework provides a quantitative, scalable basis for complexity-aware classification and inverse design of programmable matter.

Abstract

Programmable self-assembly enables the construction of complex molecular, supramolecular, and crystalline architectures from well-designed building blocks. We introduce a hypergraph-based formalism, Blocks & Bonds (B&B), that generalizes classical chemical graph theory by incorporating directed and multicolored interactions, internal symmetries, and hierarchical organization. Within this framework, we develop the Structure Code (SC), a compact and versatile language for describing self-assembled architectures. We define a Kolmogorov-style Structural Complexity as the total information content of SC, obtained through its tokenization and Shannon information assignment. Complementing this encoding-based measure, we introduce a much simpler quantity, the Compositional Complexity, which depends only on the number and cumulative usage of block and bond types in the construction set. A central result of this work is a strong empirical correlation between the token-based Structural Complexity and the Compositional Complexity across all examined systems. Owing to this agreement, the Compositional Complexity emerges as the most practical and broadly applicable measure: it is easy to compute, requires no explicit encoding, and yet closely tracks the actual information content of structurally diverse architectures. Applications to molecular systems (ethylene glycol, glucose), DNA-origami lattices, and crystalline assemblies show that B\&B hypergraphs provide a unified, scalable, and information-efficient representation of structural organization, naturally capturing symmetry, modularity, and stereochemistry. This framework establishes a quantitative foundation for complexity-aware classification and inverse design of programmable matter.

Structural and Compositional Complexities of Hierarchical Self-Assembly: a Hypergraph Approach

TL;DR

This paper tackles how to quantify the information content of programmable self-assembled structures that span molecules to crystals. It introduces Blocks & Bonds (B&B) hypergraphs and the Structure Code (SC) to encode hierarchical assemblies, and defines three related complexity measures: , , and . Empirical analysis across ethylene glycol, glucose, DNA origami lattices, and crystalline designs shows a strong correlation among these measures, with emerging as a practical, encoding-free proxy. The framework provides a quantitative, scalable basis for complexity-aware classification and inverse design of programmable matter.

Abstract

Programmable self-assembly enables the construction of complex molecular, supramolecular, and crystalline architectures from well-designed building blocks. We introduce a hypergraph-based formalism, Blocks & Bonds (B&B), that generalizes classical chemical graph theory by incorporating directed and multicolored interactions, internal symmetries, and hierarchical organization. Within this framework, we develop the Structure Code (SC), a compact and versatile language for describing self-assembled architectures. We define a Kolmogorov-style Structural Complexity as the total information content of SC, obtained through its tokenization and Shannon information assignment. Complementing this encoding-based measure, we introduce a much simpler quantity, the Compositional Complexity, which depends only on the number and cumulative usage of block and bond types in the construction set. A central result of this work is a strong empirical correlation between the token-based Structural Complexity and the Compositional Complexity across all examined systems. Owing to this agreement, the Compositional Complexity emerges as the most practical and broadly applicable measure: it is easy to compute, requires no explicit encoding, and yet closely tracks the actual information content of structurally diverse architectures. Applications to molecular systems (ethylene glycol, glucose), DNA-origami lattices, and crystalline assemblies show that B\&B hypergraphs provide a unified, scalable, and information-efficient representation of structural organization, naturally capturing symmetry, modularity, and stereochemistry. This framework establishes a quantitative foundation for complexity-aware classification and inverse design of programmable matter.

Paper Structure

This paper contains 28 sections, 34 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Examples of Blocks & Bonds (B&B) hypergraph representations. (a) Polyhedral blocks such as octahedron $[6/O]$, tetrahedron $[4/A_4]$, and square $[4/C_4]$. (b) Variants of octahedral blocks with different labeling symmetries. (c) Composite hypergraph structure constructed from square blocks. (d) Molecular hypergraph representation of ethylene glycol, $(CH_2OH)_2$.
  • Figure 2: Hypergraph representations of $\alpha$-D-glucose. (a) Conventional graph representation. (b) B&B hypergraph. (c) Hierarchical encoding using composite $C$–$O$ blocks, enabling compact representation of repeated motifs.
  • Figure 3: Two structures, both containing 60 trivalent blocks, and possessing icosahedral symmetry: (a) C$_{60}$ fullerene, and (b) Truncated dodecahedron (TD).
  • Figure 4: Comparison of three measures of structural information for all structures discussed in the text. Structural complexity $C_{\text{struct}}$ obtained by direct tokenization of each SC (circles), and $C_{B\&B}$ (squares), plotted against compositional complexity $C_{\text{comp}}$. As expected, $C_{B\&B} \le C_{\text{comp}}$ (blue dashed line). The orange dashed line shows a linear regression of $C_{\text{struct}}$ versus $C_{\text{comp}}$. All three measures exhibit very strong mutual correlations.