Shortcut learning in geometric knot classification
Djordje Mihajlovic, Davide Michieletto
TL;DR
This work interrogates whether ML knot-classification models truly learn ambient isotopy invariants or rely on geometric shortcuts. It introduces a mutual-information–based shortcut probe using geometric functionals $\{\phi_j\}$ (e.g., $\Sigma_{+}, \Omega_{+}, \kappa_{+}, M, \Pi_n$) and compares models trained on MD-generated data with those trained on geometrically unbiased GEOKNOT data, highlighting the role of data sampling. The authors show that MD datasets exhibit strong geometry–topology correlations driving shortcut learning, while GEOKNOT data do not, and they reveal that writhe-matrix representations can encode low-order topological information (e.g., a second Vassiliev invariant) that standard ML fails to recover. This motivates diagnostic tools and improved sampling strategies for true topology-aware learning in geometric knot problems, with public code and data to foster future development.
Abstract
Classifying the topology of closed curves is a central problem in low dimensional topology with applications beyond mathematics spanning protein folding, polymer physics and even magnetohydrodynamics. The central problem is how to determine whether two embeddings of a closed arc are equivalent under ambient isotopy. Given the striking ability of neural networks to solve complex classification tasks, it is therefore natural to ask if the knot classification problem can be tackled using Machine Learning (ML). In this paper, we investigate generic shortcut methods employed by ML to solve the knot classification challenge and specifically discover hidden non-topological features in training data generated through Molecular Dynamics simulations of polygonal knots that are used by ML to arrive to positive classifications results. We then provide a rigorous foundation for future attempts to tackle the knot classification challenge using ML by developing a publicly-available (i) dataset, that aims to remove the potential of non-topological feature classification and (ii) code, that can generate knot embeddings that faithfully explore chosen geometric state space with fixed knot topology. We expect that our work will accelerate the development of ML models that can solve complex geometric knot classification challenges.
