Table of Contents
Fetching ...

DFDT: Dynamic Fast Decision Tree for IoT Data Stream Mining on Edge Devices

Afonso Lourenço, João Rodrigo, João Gama, Goreti Marreiros

TL;DR

The paper addresses memory constraints in edge-based data stream mining for IoT by extending VFDT with DFDT, a dynamic, memory-aware decision tree that modulates growth based on leaf activity. DFDT introduces activity-aware pre-pruning, adaptive grace periods, and adaptive tie thresholds, enabling deactivation of low-activity leaves, conservative splitting for moderate activity, and skipping-based growth for highly active leaves. An ablation study identifies three variants (Low, Medium, High) to achieve different accuracy-memory-runtime trade-offs, maintaining compatibility with ensemble frameworks. Empirical results on diverse streams show DFDT variants offer Pareto-efficient performance, with DFDT High delivering highest accuracy and Low delivering best resource efficiency.

Abstract

The Internet of Things generates massive data streams, with edge computing emerging as a key enabler for online IoT applications and 5G networks. Edge solutions facilitate real-time machine learning inference, but also require continuous adaptation to concept drifts. While extensions of the Very Fast Decision Tree (VFDT) remain state-of-the-art for tabular stream mining, their unregulated growth limits efficiency, particularly in ensemble settings where post-pruning at the individual tree level is seldom applied. This paper presents DFDT, a novel memory-constrained algorithm for online learning. DFDT employs activity-aware pre-pruning, dynamically adjusting splitting criteria based on leaf node activity: low-activity nodes are deactivated to conserve resources, moderately active nodes split under stricter conditions, and highly active nodes leverage a skipping mechanism for accelerated growth. Additionally, adaptive grace periods and tie thresholds allow DFDT to modulate splitting decisions based on observed data variability, enhancing the accuracy-memory-runtime trade-off while minimizing the need for hyperparameter tuning. An ablation study reveals three DFDT variants suited to different resource profiles. Fully compatible with existing ensemble frameworks, DFDT provides a drop-in alternative to standard VFDT-based learners.

DFDT: Dynamic Fast Decision Tree for IoT Data Stream Mining on Edge Devices

TL;DR

The paper addresses memory constraints in edge-based data stream mining for IoT by extending VFDT with DFDT, a dynamic, memory-aware decision tree that modulates growth based on leaf activity. DFDT introduces activity-aware pre-pruning, adaptive grace periods, and adaptive tie thresholds, enabling deactivation of low-activity leaves, conservative splitting for moderate activity, and skipping-based growth for highly active leaves. An ablation study identifies three variants (Low, Medium, High) to achieve different accuracy-memory-runtime trade-offs, maintaining compatibility with ensemble frameworks. Empirical results on diverse streams show DFDT variants offer Pareto-efficient performance, with DFDT High delivering highest accuracy and Low delivering best resource efficiency.

Abstract

The Internet of Things generates massive data streams, with edge computing emerging as a key enabler for online IoT applications and 5G networks. Edge solutions facilitate real-time machine learning inference, but also require continuous adaptation to concept drifts. While extensions of the Very Fast Decision Tree (VFDT) remain state-of-the-art for tabular stream mining, their unregulated growth limits efficiency, particularly in ensemble settings where post-pruning at the individual tree level is seldom applied. This paper presents DFDT, a novel memory-constrained algorithm for online learning. DFDT employs activity-aware pre-pruning, dynamically adjusting splitting criteria based on leaf node activity: low-activity nodes are deactivated to conserve resources, moderately active nodes split under stricter conditions, and highly active nodes leverage a skipping mechanism for accelerated growth. Additionally, adaptive grace periods and tie thresholds allow DFDT to modulate splitting decisions based on observed data variability, enhancing the accuracy-memory-runtime trade-off while minimizing the need for hyperparameter tuning. An ablation study reveals three DFDT variants suited to different resource profiles. Fully compatible with existing ensemble frameworks, DFDT provides a drop-in alternative to standard VFDT-based learners.

Paper Structure

This paper contains 10 sections, 4 equations, 6 figures, 4 tables, 2 algorithms.

Figures (6)

  • Figure 1: Accuracy x Memory
  • Figure 2: Main and interaction effects of DFDT components
  • Figure 3: Accuracy x Runtime
  • Figure 4: Nemenyi test - Accuracy
  • Figure 5: Nemenyi test - Memory
  • ...and 1 more figures