Learnability in Online Kernel Selection with Memory Constraint via Data-dependent Regret Analysis

Junfan Li; Shizhong Liao

Learnability in Online Kernel Selection with Memory Constraint via Data-dependent Regret Analysis

Junfan Li, Shizhong Liao

TL;DR

This work studies online kernel selection with a fixed memory budget, addressing the gap between worst-case regret and learnability by introducing data-dependent regret bounds that depend on kernel alignment $\mathcal{A}_{T,\kappa_i}$ and cumulative losses $L_T(f)$. It proposes a buffer-based algorithmic framework that reduces online kernel selection to prediction with expert advice using adaptive sampling and strategic removal of examples, yielding two specialized algorithms: M-OMD-H for hinge loss and M-OMD-S for smooth losses. Theoretical results establish sub-linear regret under sub-linear data complexities, with tight upper and matching lower bounds for smooth losses, demonstrating learnability within memory $\mathcal{R}=O(\log T)$ when the data are favorable. Empirical validation on benchmark datasets confirms improved performance under memory constraints, illustrating practical feasibility for resource-limited devices. Overall, the paper provides a principled data-dependent perspective on memory-regret trade-offs in online kernel selection and proposes practical, memory-aware learning strategies.

Abstract

Online kernel selection is a fundamental problem of online kernel methods.In this paper,we study online kernel selection with memory constraint in which the memory of kernel selection and online prediction procedures is limited to a fixed budget. An essential question is what is the intrinsic relationship among online learnability, memory constraint, and data complexity? To answer the question,it is necessary to show the trade-offs between regret and memory constraint.Previous work gives a worst-case lower bound depending on the data size,and shows learning is impossible within a small memory constraint.In contrast, we present distinct results by offering data-dependent upper bounds that rely on two data complexities:kernel alignment and the cumulative losses of competitive hypothesis.We propose an algorithmic framework giving data-dependent upper bounds for two types of loss functions.For the hinge loss function,our algorithm achieves an expected upper bound depending on kernel alignment.For smooth loss functions,our algorithm achieves a high-probability upper bound depending on the cumulative losses of competitive hypothesis.We also prove a matching lower bound for smooth loss functions.Our results show that if the two data complexities are sub-linear,then learning is possible within a small memory constraint.Our algorithmic framework depends on a new buffer maintaining framework and a reduction from online kernel selection to prediction with expert advice. Finally,we empirically verify the prediction performance of our algorithms on benchmark datasets.

Learnability in Online Kernel Selection with Memory Constraint via Data-dependent Regret Analysis

TL;DR

and cumulative losses

. It proposes a buffer-based algorithmic framework that reduces online kernel selection to prediction with expert advice using adaptive sampling and strategic removal of examples, yielding two specialized algorithms: M-OMD-H for hinge loss and M-OMD-S for smooth losses. Theoretical results establish sub-linear regret under sub-linear data complexities, with tight upper and matching lower bounds for smooth losses, demonstrating learnability within memory

when the data are favorable. Empirical validation on benchmark datasets confirms improved performance under memory constraints, illustrating practical feasibility for resource-limited devices. Overall, the paper provides a principled data-dependent perspective on memory-regret trade-offs in online kernel selection and proposes practical, memory-aware learning strategies.

Abstract

Paper Structure (27 sections, 14 theorems, 130 equations, 3 tables, 2 algorithms)

This paper contains 27 sections, 14 theorems, 130 equations, 3 tables, 2 algorithms.

Introduction
Related Work
Problem Setup
Algorithmic Framework
Adaptive Sampling
Removing a Half of Examples
Reduction to PEA
Applications
the Hinge Loss Function
Smooth Loss Functions
Experiments
Experimental Setting
Experimental Results
Hinge Loss Function
Logistic Loss Function
...and 12 more sections

Key Result

Lemma 1

Let $M> 1$ and $\alpha\mathcal{R}:=B\geq 2M(1+\ln{T})$. For any $\mathcal{I}_T$, the expected times that M-OMD-H executes removing operation on $S_i$ are $\left\lceil\frac{4K\tilde{\mathcal{A}}_{T,\kappa_i}}{Bk_1}\right\rceil$ at most, in which

Theorems & Definitions (29)

Definition 1: Memory Budget Li2022Worst
Definition 2: Online Learnability
Lemma 1
Theorem 1
Theorem 2: Algorithm-dependent Bound
Definition 3
Lemma 2
Theorem 3
Theorem 4: Lower Bound
Theorem 5: Algorithm-dependent Bound
...and 19 more

Learnability in Online Kernel Selection with Memory Constraint via Data-dependent Regret Analysis

TL;DR

Abstract

Learnability in Online Kernel Selection with Memory Constraint via Data-dependent Regret Analysis

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (29)