EDC: Equation Discovery for Classification
Guus Toussaint, Arno Knobbe
TL;DR
Equation Discovery for Classification (EDC) introduces an interpretable framework that learns an analytic boundary function $f(x)$ with $T = true \iff f(x) \ge 0$ by composing summands through a configurable grammar. The approach combines beam-search guided equation discovery with adaptive optimisation (SGD for differentiable constants and Hill Climber otherwise), achieving competitive binary classification performance while maintaining interpretability. Across artificial and UCI datasets, EDC demonstrates strong structure recovery, robustness to noise, and favorable trade-offs between accuracy and model transparency, albeit with longer runtime than many black-box methods. The work advances explainable AI by showing that a single, human-readable equation can rival state-of-the-art classifiers on several tasks and can be tailored via domain-specific grammar extensions.
Abstract
Equation Discovery techniques have shown considerable success in regression tasks, where they are used to discover concise and interpretable models (\textit{Symbolic Regression}). In this paper, we propose a new ED-based binary classification framework. Our proposed method EDC finds analytical functions of manageable size that specify the location and shape of the decision boundary. In extensive experiments on artificial and real-life data, we demonstrate how EDC is able to discover both the structure of the target equation as well as the value of its parameters, outperforming the current state-of-the-art ED-based classification methods in binary classification and achieving performance comparable to the state of the art in binary classification. We suggest a grammar of modest complexity that appears to work well on the tested datasets but argue that the exact grammar -- and thus the complexity of the models -- is configurable, and especially domain-specific expressions can be included in the pattern language, where that is required. The presented grammar consists of a series of summands (additive terms) that include linear, quadratic and exponential terms, as well as products of two features (producing hyperbolic curves ideal for capturing XOR-like dependencies). The experiments demonstrate that this grammar allows fairly flexible decision boundaries while not so rich to cause overfitting.
