Active partitioning: inverting the paradigm of active learning
Marius Tacke, Matthias Busch, Kevin Linka, Christian J. Cyron, Roland C. Aydin
TL;DR
The paper addresses the challenge of learning datasets with multiple regimes by introducing active partitioning, a competition-driven partitioning method where multiple predictors vie for each data point and the winner trains on it. The resulting data-point allocations define partitions whose boundaries are captured by an SVM, enabling per-partition experts and a modular architecture. Across synthetic and real-world regression tasks, the approach reveals distinct patterns and, in several cases, substantially outperforms a monolithic model, with gains up to 54% in loss reduction. The work also outlines a path toward adaptive data collection and pattern-aware hyperparameter customization, highlighting practical benefits for structured datasets and expensive data scenarios.
Abstract
Datasets often incorporate various functional patterns related to different aspects or regimes, which are typically not equally present throughout the dataset. We propose a novel, general-purpose partitioning algorithm that utilizes competition between models to detect and separate these functional patterns. This competition is induced by multiple models iteratively submitting their predictions for the dataset, with the best prediction for each data point being rewarded with training on that data point. This reward mechanism amplifies each model's strengths and encourages specialization in different patterns. The specializations can then be translated into a partitioning scheme. The amplification of each model's strengths inverts the active learning paradigm: while active learning typically focuses the training of models on their weaknesses to minimize the number of required training data points, our concept reinforces the strengths of each model, thus specializing them. We validate our concept -- called active partitioning -- with various datasets with clearly distinct functional patterns, such as mechanical stress and strain data in a porous structure. The active partitioning algorithm produces valuable insights into the datasets' structure, which can serve various further applications. As a demonstration of one exemplary usage, we set up modular models consisting of multiple expert models, each learning a single partition, and compare their performance on more than twenty popular regression problems with single models learning all partitions simultaneously. Our results show significant improvements, with up to 54% loss reduction, confirming our partitioning algorithm's utility.
