Efficient Generation of Molecular Clusters with Dual-Scale Equivariant Flow Matching
Akshay Subramanian, Shuhui Qu, Cheol Woo Park, Sulin Liu, Janghwan Lee, Rafael Gómez-Bombarelli
TL;DR
The paper tackles the high computational cost of generating conformational ensembles for amorphous molecular solids by introducing a dual-scale flow matching framework. It employs two vector-field networks, $v_{\theta}$ for coarse-grained beads and $v_{\phi}$ for all-atom coordinates, to perform coarse-to-fine generation via conditional flow matching with separate training objectives and priors. Compared to single-scale flow matching, the dual-scale approach yields 15–25% improvements in bond-length and bond-angle distribution accuracy and achieves up to ~85% faster per-step inference on an A100 GPU, demonstrated on MD-derived Y6 clusters. This method enables scalable sampling for larger systems relevant to organic electronics, with future work aimed at exploring alternative coarse-graining mappings and broader performance metrics.
Abstract
Amorphous molecular solids offer a promising alternative to inorganic semiconductors, owing to their mechanical flexibility and solution processability. The packing structure of these materials plays a crucial role in determining their electronic and transport properties, which are key to enhancing the efficiency of devices like organic solar cells (OSCs). However, obtaining these optoelectronic properties computationally requires molecular dynamics (MD) simulations to generate a conformational ensemble, a process that can be computationally expensive due to the large system sizes involved. Recent advances have focused on using generative models, particularly flow-based models as Boltzmann generators, to improve the efficiency of MD sampling. In this work, we developed a dual-scale flow matching method that separates training and inference into coarse-grained and all-atom stages and enhances both the accuracy and efficiency of standard flow matching samplers. We demonstrate the effectiveness of this method on a dataset of Y6 molecular clusters obtained through MD simulations, and we benchmark its efficiency and accuracy against single-scale flow matching methods.
