Table of Contents
Fetching ...

Semi-Supervised Online Learning on the Edge by Transforming Knowledge from Teacher Models

Jiabin Xue

TL;DR

The paper tackles the problem of labeling future, unseen data in Online Edge ML by proposing Knowledge Transformation (KT), a hybrid method that merges Knowledge Distillation, Active Learning, and causal reasoning to generate pseudo-labels on edge devices. KT uses a pre-trained teacher to induce labels for the student’s online learning through a task-level knowledge mapping, without forcing the student to imitate the teacher on the same data. In simulations with weak and strong teachers, a stable teacher enables the student to reach near its theoretical maximum performance, while an unstable teacher causes label quality issues and larger performance gaps. The work demonstrates KT’s feasibility and potential applicability when teacher tasks are generic or ground-truth labeling for the student is expensive, offering a path toward practical continuous learning on resource-constrained edge devices.

Abstract

Edge machine learning (Edge ML) enables training ML models using the vast data distributed across network edges. However, many existing approaches assume static models trained centrally and then deployed, making them ineffective against unseen data. To address this, Online Edge ML allows models to be trained directly on edge devices and updated continuously with new data. This paper explores a key challenge of Online Edge ML: "How to determine labels for truly future, unseen data points". We propose Knowledge Transformation (KT), a hybrid method combining Knowledge Distillation, Active Learning, and causal reasoning. In short, KT acts as the oracle in active learning by transforming knowledge from a teacher model to generate pseudo-labels for training a student model. To verify the validity of the method, we conducted simulation experiments with two setups: (1) using a less stable teacher model and (2) a relatively more stable teacher model. Results indicate that when a stable teacher model is given, the student model can eventually reach its expected maximum performance. KT is potentially beneficial for scenarios that meet the following circumstances: (1) when the teacher's task is generic, which means existing pre-trained models might be adequate for its task, so there will be no need to train the teacher model from scratch; and/or (2) when the label for the student's task is difficult or expensive to acquire.

Semi-Supervised Online Learning on the Edge by Transforming Knowledge from Teacher Models

TL;DR

The paper tackles the problem of labeling future, unseen data in Online Edge ML by proposing Knowledge Transformation (KT), a hybrid method that merges Knowledge Distillation, Active Learning, and causal reasoning to generate pseudo-labels on edge devices. KT uses a pre-trained teacher to induce labels for the student’s online learning through a task-level knowledge mapping, without forcing the student to imitate the teacher on the same data. In simulations with weak and strong teachers, a stable teacher enables the student to reach near its theoretical maximum performance, while an unstable teacher causes label quality issues and larger performance gaps. The work demonstrates KT’s feasibility and potential applicability when teacher tasks are generic or ground-truth labeling for the student is expensive, offering a path toward practical continuous learning on resource-constrained edge devices.

Abstract

Edge machine learning (Edge ML) enables training ML models using the vast data distributed across network edges. However, many existing approaches assume static models trained centrally and then deployed, making them ineffective against unseen data. To address this, Online Edge ML allows models to be trained directly on edge devices and updated continuously with new data. This paper explores a key challenge of Online Edge ML: "How to determine labels for truly future, unseen data points". We propose Knowledge Transformation (KT), a hybrid method combining Knowledge Distillation, Active Learning, and causal reasoning. In short, KT acts as the oracle in active learning by transforming knowledge from a teacher model to generate pseudo-labels for training a student model. To verify the validity of the method, we conducted simulation experiments with two setups: (1) using a less stable teacher model and (2) a relatively more stable teacher model. Results indicate that when a stable teacher model is given, the student model can eventually reach its expected maximum performance. KT is potentially beneficial for scenarios that meet the following circumstances: (1) when the teacher's task is generic, which means existing pre-trained models might be adequate for its task, so there will be no need to train the teacher model from scratch; and/or (2) when the label for the student's task is difficult or expensive to acquire.

Paper Structure

This paper contains 20 sections, 10 equations, 12 figures, 10 tables, 1 algorithm.

Figures (12)

  • Figure 1: Traditional approaches vs Current Edge ML approaches
  • Figure 2: Overview of the proposed method. KT relies on the assumption that there exist certain causal relationships between the teacher's task and the student's task (e.g., $P \Rightarrow Q$), and it does not force statistical relations between the data distribution of both models inputs.
  • Figure 3: Knowledge Distillation (Left) and Active Learning (right) ref32ref35
  • Figure 4: KT - Concept Design
  • Figure 5: KT - Use Case Illustration
  • ...and 7 more figures