Online Learning under Haphazard Input Conditions: A Comprehensive Review and Analysis

Rohit Agarwal; Arijit Das; Alexander Horsch; Krishna Agarwal; Dilip K. Prasad

Online Learning under Haphazard Input Conditions: A Comprehensive Review and Analysis

Rohit Agarwal, Arijit Das, Alexander Horsch, Krishna Agarwal, Dilip K. Prasad

TL;DR

The paper surveys online learning under haphazard inputs, where the input feature space varies over time, and proposes a comprehensive taxonomy of models, datasets, metrics, and benchmarks, including open-source code and carbon-footprint reporting. It introduces robust evaluation metrics (AUROC, AUPRC, balanced accuracy) to address imbalanced data and benchmarks nine model families (naive Bayes, linear classifiers, decision stumps, and deep learning) on 20 binary datasets, both real and synthetic. The analysis highlights that complex models like OVFM and Aux-Drop perform best overall, while simpler methods excel on real data; it also emphasizes online-learning constraints, dataset preparation, and architectural adaptations to handle missing, sudden, obsolete, and unknown features. Additionally, the work advocates for sustainable research practices by quantifying carbon footprints and providing reproducible, open-source benchmarking resources, while outlining future directions in datasets, architectures, and applications for haphazard-input online learning.

Abstract

The domain of online learning has experienced multifaceted expansion owing to its prevalence in real-life applications. Nonetheless, this progression operates under the assumption that the input feature space of the streaming data remains constant. In this survey paper, we address the topic of online learning in the context of haphazard inputs, explicitly foregoing such an assumption. We discuss, classify, evaluate, and compare the methodologies that are adept at modeling haphazard inputs, additionally providing the corresponding code implementations and their carbon footprint. Moreover, we classify the datasets related to the field of haphazard inputs and introduce evaluation metrics specifically designed for datasets exhibiting imbalance. The code of each methodology can be found at https://github.com/Rohit102497/HaphazardInputsReview

Online Learning under Haphazard Input Conditions: A Comprehensive Review and Analysis

TL;DR

Abstract

Paper Structure (61 sections, 13 equations, 9 figures, 15 tables)

This paper contains 61 sections, 13 equations, 9 figures, 15 tables.

Introduction
Notable Contributions of This Survey
Outline of This Article
Haphazard Inputs
Datasets
Dataset Creation
crowdsense
spamassassin
imdb
diabetes_us
Metrics
Number of Errors
Accuracy
AUROC
AUPRC
...and 46 more sections

Figures (9)

Figure 1: (a) The working of online learning method. (b) The fixed input feature space of traditional online learning versus the variable input feature space of haphazard inputs.
Figure 2: PRISMA page2021prisma flowchart of our systematic review.
Figure 3: An example of haphazard inputs showcasing its characteristics.
Figure 4: Timeline of all the models capable of handling haphazard inputs.
Figure 5: Decision stump and possible ideas. The central figure is adapted from the DynFo article schreckenberger2022dynamic.
...and 4 more figures

Online Learning under Haphazard Input Conditions: A Comprehensive Review and Analysis

TL;DR

Abstract

Online Learning under Haphazard Input Conditions: A Comprehensive Review and Analysis

Authors

TL;DR

Abstract

Table of Contents

Figures (9)