Table of Contents
Fetching ...

Challenges in 3D Data Synthesis for Training Neural Networks on Topological Features

Dylan Peek, Matthew P. Skerritt, Siddharth Pritam, Stephan Chalup

TL;DR

This work addresses the scarcity of labeled 3D data for supervised Topological Data Analysis by introducing a synthetic data pipeline that controllably encodes topology through genus $g$ and Betti number $\beta_1$ using Repulsive Surfaces. It presents the RG Repulse dataset, built from seeds with $\beta_1$ in $[0,20]$ grown in random grid environments, deformed while preserving topology, and voxelized to $256^3$ with Perlin-noise augmentation, enabling $8{,}192$-point inputs for a 3D Convolutional Transformer network trained to predict $\beta_1$. Experimental results show accuracy decreases as geometric complexity increases, highlighting the separate roles of topology and geometry in estimator generalization. The dataset provides a flexible platform for training and benchmarking topology-aware estimators and persistent homology pipelines, with potential for transfer learning to real-world voxel data in domains like medical imaging and materials science.

Abstract

Topological Data Analysis (TDA) involves techniques of analyzing the underlying structure and connectivity of data. However, traditional methods like persistent homology can be computationally demanding, motivating the development of neural network-based estimators capable of reducing computational overhead and inference time. A key barrier to advancing these methods is the lack of labeled 3D data with class distributions and diversity tailored specifically for supervised learning in TDA tasks. To address this, we introduce a novel approach for systematically generating labeled 3D datasets using the Repulsive Surface algorithm, allowing control over topological invariants, such as hole count. The resulting dataset offers varied geometry with topological labeling, making it suitable for training and benchmarking neural network estimators. This paper uses a synthetic 3D dataset to train a genus estimator network, created using a 3D convolutional transformer architecture. An observed decrease in accuracy as deformations increase highlights the role of not just topological complexity, but also geometric complexity, when training generalized estimators. This dataset fills a gap in labeled 3D datasets and generation for training and evaluating models and techniques for TDA.

Challenges in 3D Data Synthesis for Training Neural Networks on Topological Features

TL;DR

This work addresses the scarcity of labeled 3D data for supervised Topological Data Analysis by introducing a synthetic data pipeline that controllably encodes topology through genus and Betti number using Repulsive Surfaces. It presents the RG Repulse dataset, built from seeds with in grown in random grid environments, deformed while preserving topology, and voxelized to with Perlin-noise augmentation, enabling -point inputs for a 3D Convolutional Transformer network trained to predict . Experimental results show accuracy decreases as geometric complexity increases, highlighting the separate roles of topology and geometry in estimator generalization. The dataset provides a flexible platform for training and benchmarking topology-aware estimators and persistent homology pipelines, with potential for transfer learning to real-world voxel data in domains like medical imaging and materials science.

Abstract

Topological Data Analysis (TDA) involves techniques of analyzing the underlying structure and connectivity of data. However, traditional methods like persistent homology can be computationally demanding, motivating the development of neural network-based estimators capable of reducing computational overhead and inference time. A key barrier to advancing these methods is the lack of labeled 3D data with class distributions and diversity tailored specifically for supervised learning in TDA tasks. To address this, we introduce a novel approach for systematically generating labeled 3D datasets using the Repulsive Surface algorithm, allowing control over topological invariants, such as hole count. The resulting dataset offers varied geometry with topological labeling, making it suitable for training and benchmarking neural network estimators. This paper uses a synthetic 3D dataset to train a genus estimator network, created using a 3D convolutional transformer architecture. An observed decrease in accuracy as deformations increase highlights the role of not just topological complexity, but also geometric complexity, when training generalized estimators. This dataset fills a gap in labeled 3D datasets and generation for training and evaluating models and techniques for TDA.

Paper Structure

This paper contains 17 sections, 7 figures, 4 tables, 1 algorithm.

Figures (7)

  • Figure 1: A 2D slice of a sample generated using the technique outlined in \ref{['secgenerate']}. The 2D binary image shows 6 holes across 4 disconnected objects. Top: raw sample. Bottom: annotated analysis.
  • Figure 2: Random growth of interlinked genus 2 (green) and genus 3 (brown) objects using the method outlined in \ref{['secgenerate']}. Visualisation performed in Blender 3.0.1. blender3.0.1
  • Figure 3: Example of a simplicial complex
  • Figure 4: Process from structure to synthetic object: (a) shows a genus 5 seed structure; (b) demonstrates an environment generated using the random grid method; and (c) displays the final genus 5 object, represented as a mesh with Voronoi surface displacement mapping.
  • Figure 5: Cross-sectional slices of a genus 5 object with Voronoi mesh displacement mapping and 3 octaves of 3D Perlin noise.
  • ...and 2 more figures