Cluster-based classification with neural ODEs via control
Antonio Álvarez-López, Rafael Orive-Illera, Enrique Zuazua
TL;DR
The paper studies binary classification with neural ODEs in a single-neuron, piecewise-constant-control setting, quantifying model complexity by the number of switches $L$ and aiming to minimize it. It introduces a cluster-based control strategy that partitions data into $d$-sized blocks, achieving a bound of $L \le 4 \lceil min(|R|,|B|)/d \rceil - 1$ under general position, and it develops a probabilistic framework showing that, for i.i.d. data, the distribution of $L$ can be characterized when axis-aligned canonical separability is enforced. The analysis also yields a maximal separating hyperplane count of $2N-1$ (with corresponding $L$ bounds) and demonstrates that separability properties improve in high dimensions, providing a blessing-of-dimensionality effect. Numerically, the results corroborate the theoretical bounds, show that $L$ often matches the bound even for moderate $d$, and reveal that autonomous (zero-switch) classification becomes highly likely as $d$ grows relative to $N$. The work connects geometric separability, control-theoretic constructs, and neural ODE dynamics to illuminate the capacity and efficiency of dynamic classifiers in high-dimensional spaces.
Abstract
We address binary classification using neural ordinary differential equations from the perspective of simultaneous control of $N$ data points. We consider a single-neuron architecture with parameters fixed as piecewise constant functions of time. In this setting, the model complexity can be quantified by the number of control switches. Previous work has shown that classification can be achieved using a point-by-point strategy that requires $O(N)$ switches. We propose a new control method that classifies any arbitrary dataset by sequentially steering clusters of $d$ points, thereby reducing the complexity to $O(N/d)$ switches. The optimality of this result, particularly in high dimensions, is supported by some numerical experiments. Our complexity bound is sufficient but often conservative because same-class points tend to appear in larger clusters, simplifying classification. This motivates studying the probability distribution of the number of switches required. We introduce a simple control method that imposes a collinearity constraint on the parameters, and analyze a worst-case scenario where both classes have the same size and all points are i.i.d. Our results highlight the benefits of high-dimensional spaces, showing that classification using constant controls becomes more probable as $d$ increases.
