Table of Contents
Fetching ...

Knot-10:A Tightness-Stratified Benchmark for Real-World Knot Classification with Topological Difficulty Analysis

Shiheng Nie, Yunguang Yue

Abstract

Physical knot classification is a fine-grained visual classification (FGVC) scenario in which appearance cues are deliberately suppressed: different classes share the same rope material, color, and background, and class identity resides primarily in crossing structure. We introduce the Knots-10 benchmark, comprising 1,440 images with a deployment-oriented split that trains on loosely tied knots and tests on tightly dressed ones. Swin-T and TransFG both average 97.2% accuracy; PMG scores 94.5%, consistent with the hypothesis that jigsaw shuffling disrupts crossing continuity. McNemar tests cannot separate four of the five general-purpose backbones, so small ranking margins should be interpreted with caution. A Mantel permutation test shows that topological distance significantly correlates with confusion patterns in three of the five models (p < 0.01). We propose TACA regularization, which improves embedding-topology alignment from rho=0.46 to rho=0.65 without improving classification accuracy; a random-distance ablation yields comparable alignment, indicating the benefit is likely driven by generic regularization. A pilot cross-domain test with 100 phone photographs reveals a 58-69 percentage-point accuracy drop, exposing rope appearance bias as the dominant failure mode.

Knot-10:A Tightness-Stratified Benchmark for Real-World Knot Classification with Topological Difficulty Analysis

Abstract

Physical knot classification is a fine-grained visual classification (FGVC) scenario in which appearance cues are deliberately suppressed: different classes share the same rope material, color, and background, and class identity resides primarily in crossing structure. We introduce the Knots-10 benchmark, comprising 1,440 images with a deployment-oriented split that trains on loosely tied knots and tests on tightly dressed ones. Swin-T and TransFG both average 97.2% accuracy; PMG scores 94.5%, consistent with the hypothesis that jigsaw shuffling disrupts crossing continuity. McNemar tests cannot separate four of the five general-purpose backbones, so small ranking margins should be interpreted with caution. A Mantel permutation test shows that topological distance significantly correlates with confusion patterns in three of the five models (p < 0.01). We propose TACA regularization, which improves embedding-topology alignment from rho=0.46 to rho=0.65 without improving classification accuracy; a random-distance ablation yields comparable alignment, indicating the benefit is likely driven by generic regularization. A pilot cross-domain test with 100 phone photographs reveals a 58-69 percentage-point accuracy drop, exposing rope appearance bias as the dominant failure mode.
Paper Structure (39 sections, 5 equations, 12 figures, 15 tables)

This paper contains 39 sections, 5 equations, 12 figures, 15 tables.

Figures (12)

  • Figure 1: Knot difficulty tiers based on topological properties. Bar length indicates visual crossing number.
  • Figure 2: Confusion matrices for Swin-T (left) and ViT-B/16 (right). Rows are true labels, columns are predictions. Swin-T achieves near-perfect classification. ViT-B/16 shows systematic confusion between topologically similar knots: Figure-8 Knot vs. Overhand Knot (both prime stopper knots) and Fisherman's Knot vs. Flemish Bend (both composite bends). These patterns align with topological similarity (Section \ref{['sec:correlation']}).
  • Figure 3: Comparison of t-SNE (top) and UMAP (bottom) projections of ResNet-18 embeddings. Left: baseline training; right: topology-guided training. Convex hulls indicate topological family boundaries. Both methods show that topology-guided training produces family-coherent clusters.
  • Figure S1: Validation and test accuracy across all eight architectures (five general-purpose, three FGVC-specialized), seed 42 (CUDA rerun). Models are grouped by category with a visual separator. Swin-T achieves the highest test accuracy (98.8%), followed by ViT-B/16 (96.9%) and TransFG (96.3%). Graph-FGVC (94.2%) underperforms under this seed, though it reaches 98.8% under seed 456. See Supplementary Table \ref{['tab:robustness']} for multi-seed results.
  • Figure S2: Per-class F1 scores across all eight architectures (five general-purpose, three FGVC-specialized), seed 42 (CUDA rerun). Darker green indicates higher F1. A horizontal line separates General and FGVC-Specialized models. Clove Hitch (CH) achieves perfect scores across all models, while Figure-8 Knot (F8K) is consistently the most challenging class (F1 as low as 0.81 for ViT-B/16). Graph-FGVC shows notably lower F1 for OHK (0.86) and F8K (0.86), reflecting its seed sensitivity under this initialization.
  • ...and 7 more figures