Table of Contents
Fetching ...

Explainable Binary Classification of Separable Shape Ensembles

Zachary Grey, Nicholas Fisher, Andrew Glaws

TL;DR

This work presents a framework for explainable binary classification of large ensembles of boundary curves by formulating Separable Shape Tensors (SST) that separate generalized scale from undulation while preserving rigid invariances. It ground SST in a dual RKHS interpretation via a Hilbert-Schmidt integral operator, enabling a finite, efficient discretization through Nyström-based SRQD and projections onto the Grassmannian and SPD matrix manifolds. The methodology yields interpretable, statistically testable features (t, ell) whose distributions across image ensembles are compared with product maximum mean discrepancy (pMMD), without requiring labeled training data. Through EBSD and battery SEM experiments, the approach demonstrates sensitivity to subvisual differences in shape, robustness to segmentation variations, and practical guidance for parameter settings, highlighting its potential as a scalable, explainable tool for image-based shape analysis.

Abstract

Scientists, engineers, biologists, and technology specialists universally leverage image segmentation to extract shape ensembles containing many thousands of curves representing patterns in observations and measurements. These large curve ensembles facilitate inferences about important changes when comparing and contrasting images. We introduce novel pattern recognition formalisms combined with inference methods over large ensembles of segmented curves. Our formalism involves accurately approximating eigenspaces of composite integral operators to motivate discrete, dual representations of curves collocated at quadrature nodes. Approximations are projected onto underlying matrix manifolds and the resulting separable shape tensors constitute rigid-invariant decompositions of curves into generalized (linear) scale variations and complementary (nonlinear) undulations. With thousands of curves segmented from pairs of images, we demonstrate how data-driven features of separable shape tensors inform explainable binary classification utilizing a product maximum mean discrepancy; absent labeled data, building interpretable feature spaces in seconds without high performance computation, and detecting discrepancies below cursory visual inspections.

Explainable Binary Classification of Separable Shape Ensembles

TL;DR

This work presents a framework for explainable binary classification of large ensembles of boundary curves by formulating Separable Shape Tensors (SST) that separate generalized scale from undulation while preserving rigid invariances. It ground SST in a dual RKHS interpretation via a Hilbert-Schmidt integral operator, enabling a finite, efficient discretization through Nyström-based SRQD and projections onto the Grassmannian and SPD matrix manifolds. The methodology yields interpretable, statistically testable features (t, ell) whose distributions across image ensembles are compared with product maximum mean discrepancy (pMMD), without requiring labeled training data. Through EBSD and battery SEM experiments, the approach demonstrates sensitivity to subvisual differences in shape, robustness to segmentation variations, and practical guidance for parameter settings, highlighting its potential as a scalable, explainable tool for image-based shape analysis.

Abstract

Scientists, engineers, biologists, and technology specialists universally leverage image segmentation to extract shape ensembles containing many thousands of curves representing patterns in observations and measurements. These large curve ensembles facilitate inferences about important changes when comparing and contrasting images. We introduce novel pattern recognition formalisms combined with inference methods over large ensembles of segmented curves. Our formalism involves accurately approximating eigenspaces of composite integral operators to motivate discrete, dual representations of curves collocated at quadrature nodes. Approximations are projected onto underlying matrix manifolds and the resulting separable shape tensors constitute rigid-invariant decompositions of curves into generalized (linear) scale variations and complementary (nonlinear) undulations. With thousands of curves segmented from pairs of images, we demonstrate how data-driven features of separable shape tensors inform explainable binary classification utilizing a product maximum mean discrepancy; absent labeled data, building interpretable feature spaces in seconds without high performance computation, and detecting discrepancies below cursory visual inspections.

Paper Structure

This paper contains 24 sections, 5 theorems, 58 equations, 16 figures, 1 table.

Key Result

Lemma 1

$k_{\mathcal{T}}[\boldsymbol{c}]$ is symmetric.

Figures (16)

  • Figure 1: An example ensemble of thousands of grain boundaries from an EBSD image FAN2021116810Fan2020 (left) and an example segmented grain boundary (right) with arc-length reparametrization landmarks (blue circles) generated by an interpolating curve (red). Data is available online Fan2020. The micron bar in the lower left corner of the EBSD grain boundaries image reads $500$ micrometers.
  • Figure 2: (left) A scanning electron microscopy of a lithium-ion battery cross-section, (middle) Lagrangian coherent structures in the lower-left quadrant of a cyclone modeled by large-eddy simulation, (right) a 'superpixel' segmentation of canine named Penni.
  • Figure 3: (top) An increasing number of landmarks collocated at quadrature nodes over a Cassini oval. Dashed lines are shown as a visual cue connecting nodes in a particular order corresponding to $8$, $16$, and $32$ landmarks each. Replicated nodes, $\lbrace \boldsymbol{\tau}(s_i)\rbrace_{n=8} \subset \lbrace \boldsymbol{\tau}(s_i)\rbrace_{n=16} \subset \lbrace \boldsymbol{\tau}(s_i)\rbrace_{n=32}$, over the increasing total number of nodes have darker shading. (bottom) Convergence plot over the same Cassini oval with increasing number of quadrature nodes, $n$. Error is the maximum $2$-norm difference in component functions over the curve parameter taken between approximated PRRTI-features and a reference solution with $8192$ nodes. A black dashed curve is shown with corresponding rate of $\sim 6.5$ to emphasize the spectral nature of convergence.
  • Figure 4: (left) An arbitrary cyclic permutation and rotation of a uniformly discrete grain shape with colors indicating row index, $n=500$. (center) An orthogonal Procrustes match. (right) A brute force cyclic Procrustes match.
  • Figure 5: Cyclic Procrustes matching at low ($n=50$) and high ($n=250$) levels of uniform arc-length reparametrization. The fixed archetype is shown with a black curve. The fill color of the matched shape corresponds to the levels of refinement ($n=50$ blue and $n=250$ orange) in the discrete objective function evaluations. The landmark colors correspond to the registered indices of the shape against the archetype.
  • ...and 11 more figures

Theorems & Definitions (11)

  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • Definition 1: Evaluation functionals of RKHS, micheli2013matrix
  • Theorem 1
  • proof
  • Theorem 2: Separable Cyclic Procrustes
  • ...and 1 more