Split Knowledge Distillation for Large Models in IoT: Architecture, Challenges, and Solutions
Zuguang Li, Wen Wu, Shaohua Wu, Qiaohua Lin, Yaping Sun, Hui Wang
TL;DR
This paper addresses the challenge of deploying large language models in IoT environments by combining knowledge distillation with split learning to create compact, privacy-preserving models suitable for edge devices. The proposed split knowledge distillation framework enables collaborative training between an edge server holding the full teacher and student models and lightweight IoT devices that process only embedding modules, optimizing cut-layer selection and resource usage to minimize energy and latency. A case study distills a large LLaMA-based model from 8B to 1B across a fleet of heterogeneous devices, showing meaningful gains in training speed and energy efficiency under varying channel conditions. The work offers a practical pathway for deploying intelligent, privacy-conscious AI in energy- and latency-constrained IoT ecosystems.
Abstract
Large models (LMs) have immense potential in Internet of Things (IoT) systems, enabling applications such as intelligent voice assistants, predictive maintenance, and healthcare monitoring. However, training LMs on edge servers raises data privacy concerns, while deploying them directly on IoT devices is constrained by limited computational and memory resources. We analyze the key challenges of training LMs in IoT systems, including energy constraints, latency requirements, and device heterogeneity, and propose potential solutions such as dynamic resource management, adaptive model partitioning, and clustered collaborative training. Furthermore, we propose a split knowledge distillation framework to efficiently distill LMs into smaller, deployable versions for IoT devices while ensuring raw data remains local. This framework integrates knowledge distillation and split learning to minimize energy consumption and meet low model training delay requirements. A case study is presented to evaluate the feasibility and performance of the proposed framework.
