Computationally lightweight classifiers with frequentist bounds on predictions

Shreeram Murali; Cristian R. Rojas; Dominik Baumann

Computationally lightweight classifiers with frequentist bounds on predictions

Shreeram Murali, Cristian R. Rojas, Dominik Baumann

Abstract

While both classical and neural network classifiers can achieve high accuracy, they fall short on offering uncertainty bounds on their predictions, making them unfit for safety-critical applications. Existing kernel-based classifiers that provide such bounds scale with $\mathcal O (n^{\sim3})$ in time, making them computationally intractable for large datasets. To address this, we propose a novel, computationally efficient classification algorithm based on the Nadaraya-Watson estimator, for whose estimates we derive frequentist uncertainty intervals. We evaluate our classifier on synthetically generated data and on electrocardiographic heartbeat signals from the MIT-BIH Arrhythmia database. We show that the method achieves competitive accuracy $>$\SI{96}{\percent} at $\mathcal O(n)$ and $\mathcal O(\log n)$ operations, while providing actionable uncertainty bounds. These bounds can, e.g., aid in flagging low-confidence predictions, making them suitable for real-time settings with resource constraints, such as diagnostic monitoring or implantable devices.

Computationally lightweight classifiers with frequentist bounds on predictions

Abstract

in time, making them computationally intractable for large datasets. To address this, we propose a novel, computationally efficient classification algorithm based on the Nadaraya-Watson estimator, for whose estimates we derive frequentist uncertainty intervals. We evaluate our classifier on synthetically generated data and on electrocardiographic heartbeat signals from the MIT-BIH Arrhythmia database. We show that the method achieves competitive accuracy

\SI{96}{\percent} at

and

operations, while providing actionable uncertainty bounds. These bounds can, e.g., aid in flagging low-confidence predictions, making them suitable for real-time settings with resource constraints, such as diagnostic monitoring or implantable devices.

Paper Structure (39 sections, 10 theorems, 50 equations, 10 figures, 4 tables)

This paper contains 39 sections, 10 theorems, 50 equations, 10 figures, 4 tables.

INTRODUCTION
Contributions.
PROBLEM SETTING
Overlapping distributions.
Separable distributions.
Nature of measurements.
CLASSIFIER
Nadaraya-Watson classifier
Deriving bounds on the estimates
Bias
Sampling error
Combined bounds
Computational efficiency improvements
EXPERIMENTS
Synthetic data.
...and 24 more sections

Key Result

Lemma 1

Under Assumptions ass:lipschitz and ass:kernel, we have, for all $n \geq 0$ and $y \in \mathcal{Y}$, where $L$ is the known Lipschitz constant from eq:lipschitz and $\lambda$ is the user-defined kernel bandwidth from eq:nwc-kernel-definitions.

Figures (10)

Figure 1: The two synthetic datasets used for evaluating the classifier. One dataset with overlapping classes, adhering to Assumption \ref{['ass:lipschitz']} (left), and one with classes separated by a known margin, in accordance with Assumption \ref{['ass:separable']} (right).
Figure 2: Performance of the proposed classifiers on the synthetic datasets compared to baselines with varying sample sizes. We plot total runtime, prediction time, fit time (top row); average bounds $\bar{\epsilon}_c$ for $\delta=0.05$ for all classes, accuracy on the overlapping and separable dataset (bottom row). We observe that our algorithm is significantly more sample-efficient than the CME-based classifier, while achieving high accuracy with minimal uncertainty.
Figure 3: Mean uncertainty intervals, precision-recall metrics for the proposed classifiers, and waveforms. Our classifier shows higher uncertainty in misclassified labels, as well as high precision and recall scores.
Figure 4: Two synthetic datasets and their corresponding dyadic cell prediction grids. The figures on the left correspond to the Lipschitz-continuous overlapping dataset; the figures on the right correspond to the dataset separated by a margin.
Figure 5: Illustrative ECG waveforms (amplitude normalized from mV, $t$ denotes time step) and associated class probabilities for each class from the MIT-BIH database; class Q (unclassifiable) has been excluded.
...and 5 more figures

Theorems & Definitions (18)

Remark 1
Lemma 1
Lemma 2
Lemma 3
Lemma 4
Corollary 1
proof
Remark 2
Remark 3
Lemma 4
...and 8 more

Computationally lightweight classifiers with frequentist bounds on predictions

Abstract

Computationally lightweight classifiers with frequentist bounds on predictions

Authors

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (18)