On knot detection via picture recognition
Anne Dranowski, Yura Kabkov, Daniel Tubbenhauer
TL;DR
This work tackles automatic knot recognition from images by proposing an image-to-PD-to-invariant pipeline that combines perception via neural networks with topological invariants for robust classification. It advocates a two-stage approach: first predict a knot diagram from a photo, then reconstruct a PD presentation to enable invariant computations such as the Jones polynomial for final labeling. The study provides baselines using a vanilla NN, a CNN, and a CvT architecture to predict knot crossing numbers from skeletonized diagrams, finding that spatial inductive bias improves accuracy (CNN and CvT outperform Vanilla). The authors justify focusing on small-crossing knots due to physical and observational biases and discuss extensions to 3D data and real-world applications in DNA and protein knots, highlighting the practical potential of integrating perception with symbolic topology. Overall, the paper presents a principled, interpretable path toward automated knot recognition that leverages both modern machine learning and classical topological invariants to achieve robust classification in a challenging visual domain.
Abstract
Our goal is to one day take a photo of a knot and have a phone automatically recognize it. In this expository work, we explain a strategy to approximate this goal, using a mixture of modern machine learning methods (in particular convolutional neural networks and transformers for image recognition) and traditional algorithms (to compute quantum invariants like the Jones polynomial). We present simple baselines that predict crossing number directly from images, showing that even lightweight CNN and transformer architectures can recover meaningful structural information. The longer-term aim is to combine these perception modules with symbolic reconstruction into planar diagram (PD) codes, enabling downstream invariant computation for robust knot classification. This two-stage approach highlights the complementarity between machine learning, which handles noisy visual data, and invariants, which enforce rigorous topological distinctions.
