Prototype Selection Using Topological Data Analysis
Jordan Eckert, Elvan Ceyhan, Henry Schenck
TL;DR
TPS addresses the challenge of reducing dataset size for classification while preserving performance by leveraging topological structure through persistent homology. The method constructs a two-parameter (bifiltration) topological representation to identify boundary-proximate, topology-rich prototypes, extracting vertex sets from carefully selected sub-complexes. Empirical results on nine simulated datasets and eight real-world datasets show TPS achieves substantial data reduction (roughly 60-85%) with maintained or improved G-Mean across classifiers, outperforming several baseline prototype selectors in many settings, though performance can depend on the metric. The work demonstrates the feasibility and practicality of topology-informed prototype selection, offering a scalable and interpretable alternative for model reduction and data summarization.
Abstract
Recently, there has been an explosion in statistical learning literature to represent data using topological principles to capture structure and relationships. We propose a topological data analysis (TDA)-based framework, named Topological Prototype Selector (TPS), for selecting representative subsets (prototypes) from large datasets. We demonstrate the effectiveness of TPS on simulated data under different data intrinsic characteristics, and compare TPS against other currently used prototype selection methods in real data settings. In all simulated and real data settings, TPS significantly preserves or improves classification performance while substantially reducing data size. These contributions advance both algorithmic and geometric aspects of prototype learning and offer practical tools for parallelized, interpretable, and efficient classification.
