Table of Contents
Fetching ...

Topological shape transform for thymus structures

Haochen Yang, Vadim Lebovici, Andreas Tarcevski, Liliana Tchernev, Saulius Zuklys, Georg A. Holländer, Helen M. Byrne, Heather A. Harrington

TL;DR

SampEuler is introduced, a novel ECT-based shape descriptor designed to achieve enhanced robustness to perturbations and reveals age-dependent structural changes that offer new insights into thymic organization and involution.

Abstract

The Euler characteristic transform (ECT) is an emerging and powerful framework within topological data analysis for quantifying the geometry of shape. The applicability of ECT has been limited due to its sensitivity to noisy data. Here, we introduce SampEuler, a novel ECT-based shape descriptor designed to achieve enhanced robustness to perturbations. We provide a theoretical analysis establishing the stability of SampEuler and validate these properties empirically through pairwise similarity analyses on a benchmark dataset and showcase it on a thymus dataset. The thymus is a primary lymphoid organ that is essential for the maturation and selection of self-tolerant T cells, and within the thymus, thymic epithelial cells are organized in complex three-dimensional architectures, yet the principles governing their formation, functional organization, and remodeling during age-related involution remain poorly understood. Addressing these questions requires robust and informative shape descriptors capable of capturing subtle architectural changes across developmental stages. We develop and apply SampEuler to a newly generated two-dimensional imaging dataset of mouse thymi spanning multiple age groups, where SampEuler outperforms both persistent homology--based methods and deep learning models in detecting subtle, localized morphological differences associated with aging. To facilitate interpretation, we develop a vectorization and visualization framework for SampEuler, which preserves rich morphological information and enables identification of structural features that distinguish thymi across age groups. Collectively, our results demonstrate that SampEuler provides a robust and interpretable approach for quantifying thymic architecture and reveals age-dependent structural changes that offer new insights into thymic organization and involution.

Topological shape transform for thymus structures

TL;DR

SampEuler is introduced, a novel ECT-based shape descriptor designed to achieve enhanced robustness to perturbations and reveals age-dependent structural changes that offer new insights into thymic organization and involution.

Abstract

The Euler characteristic transform (ECT) is an emerging and powerful framework within topological data analysis for quantifying the geometry of shape. The applicability of ECT has been limited due to its sensitivity to noisy data. Here, we introduce SampEuler, a novel ECT-based shape descriptor designed to achieve enhanced robustness to perturbations. We provide a theoretical analysis establishing the stability of SampEuler and validate these properties empirically through pairwise similarity analyses on a benchmark dataset and showcase it on a thymus dataset. The thymus is a primary lymphoid organ that is essential for the maturation and selection of self-tolerant T cells, and within the thymus, thymic epithelial cells are organized in complex three-dimensional architectures, yet the principles governing their formation, functional organization, and remodeling during age-related involution remain poorly understood. Addressing these questions requires robust and informative shape descriptors capable of capturing subtle architectural changes across developmental stages. We develop and apply SampEuler to a newly generated two-dimensional imaging dataset of mouse thymi spanning multiple age groups, where SampEuler outperforms both persistent homology--based methods and deep learning models in detecting subtle, localized morphological differences associated with aging. To facilitate interpretation, we develop a vectorization and visualization framework for SampEuler, which preserves rich morphological information and enables identification of structural features that distinguish thymi across age groups. Collectively, our results demonstrate that SampEuler provides a robust and interpretable approach for quantifying thymic architecture and reveals age-dependent structural changes that offer new insights into thymic organization and involution.
Paper Structure (15 sections, 11 theorems, 38 equations, 17 figures)

This paper contains 15 sections, 11 theorems, 38 equations, 17 figures.

Key Result

Theorem S1

Given a geometric simplicial $K\subset{\mathbb R}^d$, if the ECT of $K$ is viewed as $\text{ECT}(K): S^{d-1}\rightarrow \operatorname{CF}_{}({\mathbb R})$, a map from the space of directions to the space of ECCs, then $\text{ECT}(K)$ is Lipschitz continuous with normal metric on $S^{d-1}$ and $L_p

Figures (17)

  • Figure 1: MOTIVATION FOR NEW PUSHFORWARD MEASURE on ECT. We apply ECT, DETECT, SampEuler, and vectorization of SampEuler to aligned toy simplicial complexes(Left) and randomly rotated simplicial complexes (right).(A) Representations of simplicial complexes from the two classes. Each class shares the same base complex, in other words, the same shape, generated by attaching one end of three edges of fixed lengths with 100 vertices along each edge. We attach one end to the origin and randomly rotate the other end around the origin. For each sample, we add Gaussian noise at points along each edge. In the aligned case, we use the resulting complexes directly. In the rotated case, we apply individual random rotations centred at the origin to the whole complex around the origin. (B) Two-dimensional MDS embedding (a projection that preserves pairwise distances between shapes) of ECT shape descriptors. The results of two distinct classes can be separated when all complexes are aligned. However, in the rotated case, ECT results are heavily affected by the rotations and fail to distinguish two classes. (C) MDS embedding of DETECT shape descriptors. DETECT fails to distinguish two classes of shapes in both experiments due to information loss. (D) MDS embedding of SampEuler shape descriptors. In both experiments, the two classes are separated correctly. (E) MDS embedding of vectorized SampEuler shape descriptors. In both experiments, the two classes are separated correctly.
  • Figure 2: CLASSIFICATION STUDY on MPEG7 DATASET. We compare SampEuler with conventional shape analysis methods using the MPEG7 dataset ralph1999mpeg7CEShape1. (A) Examples of the MPEG7 dataset images, one sample from each of the 10-class subset used in previous studies le2018persistencele2019tree. (B) The barplot of the accuracy of each method in the classification task of the 10-class subset of MPEG7. We denote Persistence Scale Space Kernel reininghaus2015stable by $\mathcal{K}_{PSS}$, Persistence Weighted Gaussian Kernel kusano2016persistence by $\mathcal{K}_{PWG}$, Sliced Wasserstein Kernel carriere2017sliced by $\mathcal{K}_{SW}$, Tangent Vector Representation with Gaussian Kernel anirudh2016riemannian by $\mathcal{K}_{TVRG}$, Persistence Fisher Kernel le2018persistence by $\mathcal{K}_{PF}$. (C) The barplot of the accuracy of each method in the classification task of the whole MPEG7 dataset (70 classes). We use methods: Skeleton Paths bai2009integrating, Bag of Contours wang2014bag, Persistence Image + Neural Network (PI+NN) adams2017persistence, Persistence Codebook + Neural Network (PC+NN) hofer2019learning, ECT, DETECT, SampEuler and vectorization of SampEuler.
  • Figure 3: CLASSIFICATION STUDY ON SELECTED THYMIC QUADRANTS. We segment masked images of thymic epithelium into quadrants of size $200\times 200$ pixels. Then, we classify them into old and young groups separately for K8 and K14 markers. (A) Example images of thymic quadrants of each age group and each marker. (B) Barplots of the classification accuracies of each method. The top plot contains accuracies for classifying K8 quadrants, and the bottom plot contains accuracies for classifying K14 quadrants. The error bars show the standard deviations for repeating the classification 50 times. (C) Barplots of the classification runtimes of each method. The top plot contains runtimes in minutes for classifying K8 quadrants, and the bottom plot contains runtimes in minutes for classifying K14 quadrants. (D) We use the SampEuler vectorization as input features for classification training. We repeat the train-test process 50 times and keep trials with over $80\%$ accuracy. For trials with accuracy over $80\%$, we compute the SHAP value for each feature. For each marker and age group, we average across all test samples of the age group and then across all valid trials. The plot shows average SHAP values for each feature as feature importances towards classifying the sample as a given age group. (E) Depth-dependent comparisons between thymic quadrants from different age groups. For both plots, the x-axis shows the normalized depth of the thymus, where 0 corresponds to quadrants at the capsule and 1 to quadrants at the cortex--medulla junction (CMJ). In the left plot, the y-axis shows the energy distance szekely2013energy between age groups based on T-cell enrichment ratios. In the right plot, the y-axis shows the energy distance between age groups based on pairwise SampEuler distances.
  • Figure 4: K-MEDOIDS CLUSTERING RESULTS. We cluster all cortex quadrants from both age groups using SampEuler distances and selected cell enrichment ratio distances with k-medoids clustering. We match the clustering results by finding the label permutation that gives the largest number of matched patches. (A) Example overlay images of the thymic quadrants. The top two samples are young thymi, and the bottom two samples are old thymi. A i) and iii) use SampEuler distances to perform shape-based clustering, and A ii) and iv) use selected cell type enrichment ratios. We permute the labels to obtain the best match between two results. (B) Average cell enrichment ratios by cluster and cell type. Rows represent the three cell-composition-based clusters, columns represent selected cell types, and values show the mean enrichment ratio for each cell type within each cluster. (C) Proportional confusion matrices of matching the two clustering results. The $(i,j)$-th entry of the matrix is given by the formula $\frac{\text{No.\ of quadrants being classified as shape cluster } i \text{ and cell composition cluster } j}{\text{No.\ of quadrants being classified as cell composition cluster } j}$
  • Figure S1: A: Pipeline for generating a class of tree samples. We first generate three edges of lengths 2, 3, and 4 from the origin, randomly oriented. We discretize each edge by sampling 100 equally spaced points along it and connecting consecutive points with edges. For each sample in the class, we randomly rotate the tree around the origin and then add coordinate-wise Gaussian noise of $\sigma = 0.02$ to each vertex. We keep the original connecting order of the edges. B: Preprocessing pipeline of MPEG-7 Core Experiment Shape-Matching dataset. For each image, we first expand all white connected components by 1 pixel in all directions to close small gaps and filter to keep the largest connected component. We then fill in all the holes for the filtered image using 2020SciPy-NMeth. We pad black pixels around the original image to normalize all images to the same size, with the centroid of the connected component at the center of each padded image. C: Four example image quadrants of the mouse thymus dataset, each of size 200 $\times$ 200 pixels. The pixels are of side length $0.5 \text{ }\mu \text{ m}$. From left to right, top to bottom, they are from young-cortex, old-cortex, young-medulla, and old-medulla.
  • ...and 12 more figures

Theorems & Definitions (35)

  • Definition S1
  • Definition S2
  • Example 1
  • Definition S3
  • Definition S4
  • Theorem S1: Continuity and Hausdorffness curry2022many
  • Theorem S2: Injectivity theorem curry2022manyghrist2018persistent
  • Definition S5
  • Definition S6
  • Definition S7
  • ...and 25 more