Table of Contents
Fetching ...

Accelerating prototype selection with spatial abstraction

Joel Luís Carbonera

TL;DR

This work tackles the scalability of prototype selection for large, numerical datasets by introducing PSASA, a pre-processing approach based on spatial abstraction that accelerates downstream prototype selection. PSASA first partitions the data space into $n$ intervals per dimension to produce partition-wise class prototypes, then feeds the reduced prototype set into established algorithms to refine the final selection. Across 14 datasets and five baseline methods, PSASA-enabled pipelines achieve substantial runtime speedups with accuracy and reduction rates that are comparable to or better than the original methods, validating the method’s practical impact for large-scale learning. The approach is especially advantageous when computational resources are constrained, offering a linear-time preprocessing step that preserves essential data structure while enabling scalable prototype selection for real-world applications.

Abstract

The increasing digitalization in industry and society leads to a growing abundance of data available to be processed and exploited. However, the high volume of data requires considerable computational resources for applying machine learning approaches. Prototype selection techniques have been applied to reduce the requirements of computational resources that are needed by these techniques. In this paper, we propose an approach for speeding up existing prototype selection techniques. It builds an abstract representation of the dataset, using the notion of spatial partition. The second step uses this abstract representation to prune the search space efficiently and select a set of candidate prototypes. After, some conventional prototype selection algorithms can be applied to the candidates selected by our approach. Our approach was integrated with five conventional prototype selection algorithms and tested on 14 widely recognized datasets used in classification tasks. The performance of the modified algorithms was compared to that of their original versions in terms of accuracy and reduction rate. The experimental results demonstrate that, overall, our proposed approach maintains accuracy while enhancing the reduction rate of the original prototype selection algorithms and simultaneously reducing their execution times.

Accelerating prototype selection with spatial abstraction

TL;DR

This work tackles the scalability of prototype selection for large, numerical datasets by introducing PSASA, a pre-processing approach based on spatial abstraction that accelerates downstream prototype selection. PSASA first partitions the data space into intervals per dimension to produce partition-wise class prototypes, then feeds the reduced prototype set into established algorithms to refine the final selection. Across 14 datasets and five baseline methods, PSASA-enabled pipelines achieve substantial runtime speedups with accuracy and reduction rates that are comparable to or better than the original methods, validating the method’s practical impact for large-scale learning. The approach is especially advantageous when computational resources are constrained, offering a linear-time preprocessing step that preserves essential data structure while enabling scalable prototype selection for real-world applications.

Abstract

The increasing digitalization in industry and society leads to a growing abundance of data available to be processed and exploited. However, the high volume of data requires considerable computational resources for applying machine learning approaches. Prototype selection techniques have been applied to reduce the requirements of computational resources that are needed by these techniques. In this paper, we propose an approach for speeding up existing prototype selection techniques. It builds an abstract representation of the dataset, using the notion of spatial partition. The second step uses this abstract representation to prune the search space efficiently and select a set of candidate prototypes. After, some conventional prototype selection algorithms can be applied to the candidates selected by our approach. Our approach was integrated with five conventional prototype selection algorithms and tested on 14 widely recognized datasets used in classification tasks. The performance of the modified algorithms was compared to that of their original versions in terms of accuracy and reduction rate. The experimental results demonstrate that, overall, our proposed approach maintains accuracy while enhancing the reduction rate of the original prototype selection algorithms and simultaneously reducing their execution times.
Paper Structure (6 sections, 3 equations, 12 figures, 4 tables, 2 algorithms)

This paper contains 6 sections, 3 equations, 12 figures, 4 tables, 2 algorithms.

Figures (12)

  • Figure 1: Representation of a 2D space, with points from two classes (blue and red dots), split in 6 spatial partitions.
  • Figure 2: General schema of a pipeline that uses the PSASA algorithm for speeding up some conventional prototype selection algorithm.
  • Figure 3: Detailed qualitative comparison among the conventional prototype selection algorithms and their enhanced versions. OA means Original algorithm and EA means Enhanced Algorithm
  • Figure 4: Abstract qualitative comparison among the conventional prototype selection algorithms and their enhanced versions. OA means Original algorithm and EA means Enhanced Algorithm
  • Figure 5: Comparison of the running times of the 10 prototype selection algorithms, considering two datasets. Notice that the time axis uses a logarithmic scale.
  • ...and 7 more figures

Theorems & Definitions (1)

  • Definition 1