Encoding molecular structures in quantum machine learning
Choy Boy, Edoardo Altamura, Dilhan Manawadu, Ivano Tavernelli, Stefano Mensa, David J. Wales
TL;DR
QMSE introduces a bond-order–aware encoding of molecular structures into quantum circuits via a hybrid Coulomb–adjacency matrix, addressing limitations of fingerprint encodings in state separability and trainability. By mapping the matrix to one- and two-qubit rotations, QMSE yields more expressive and interpretable feature maps, with a fidelity-preserving chain-contraction theorem enabling qubit reductions for long chains. Benchmarking on 105 small molecules shows QMSE outperforms angle-based fingerprint encoding in classification and regression tasks, and experiments demonstrate robustness to hardware noise and favorable training dynamics. The work suggests QMSE as a practical pathway toward scalable quantum-assisted modelling of chemical data and a bridge to future FTQC-enabled graph-state encodings and kernel methods.
Abstract
Quantum machine learning (QML) has great potential for the analysis of chemical datasets. However, conventional quantum data-encoding schemes, such as fingerprint encoding, are generally unfeasible for the accurate representation of chemical moieties in such datasets. In this contribution, we introduce the quantum molecular structure encoding (QMSE) scheme, which encodes the molecular bond orders and interatomic couplings expressed as a hybrid Coulomb-adjacency matrix, directly as one- and two-qubit rotations within parameterised circuits. We show that this strategy provides an efficient and interpretable method in improving state separability between encoded molecules compared to other fingerprint encoding methods, which is especially crucial for the success in preparing feature maps in QML workflows. To benchmark our method, we train a parameterised ansatz on molecular datasets to perform classification of state phases and regression on boiling points, demonstrating the competitive trainability and generalisation capabilities of QMSE. We further prove a fidelity-preserving chain-contraction theorem that reuses common substructures to cut qubit counts, with an application to long-chain fatty acids. We expect this scalable and interpretable encoding framework to greatly pave the way for practical QML applications of molecular datasets.
