Tiny-WiFo: A Lightweight Wireless Foundation Model for Channel Prediction via Multi-Component Adaptive Knowledge Distillation
Haotian Zhang, Shijian Gao, Xiang Cheng
TL;DR
Wireless foundation models enable powerful channel prediction but are impractical for edge deployment due to size and latency. The authors introduce Multi-Component Adaptive Knowledge Distillation (MCAKD), including Cross-Attention-Based Knowledge Selection (CA-KS) and an Autonomous Learning-Passive Learning (AL-PL) strategy, to compress WiFo into Tiny-WiFo with only 5.5M parameters while preserving most of the teacher’s capabilities. MCAKD transfers knowledge from the teacher through three mechanisms—attention weights, embedding-layer representations, and hidden states—guided by CA-KS, and balances imitation with self-learning to reduce expensive teacher usage. On 18 CSI datasets, Tiny-WiFo achieves about 98% of WiFo’s performance with a substantial reduction in inference time, and retains zero-shot generalization, enabling real-time deployment on edge devices.
Abstract
The massive scale of Wireless Foundation Models (FMs) hinders their real-time deployment on edge devices. This letter moves beyond standard knowledge distillation by introducing a novel Multi-Component Adaptive Knowledge Distillation (MCAKD) framework. Key innovations include a Cross-Attention-Based Knowledge Selection (CA-KS) module that selectively identifies critical features from the teacher model, and an Autonomous Learning-Passive Learning (AL-PL) strategy that balances knowledge transfer with independent learning to achieve high training efficiency at a manageable computational cost. When applied to the WiFo FM, the distilled Tiny-WiFo model, with only 5.5M parameters, achieves a 1.6 ms inference time while retaining over 98% of WiFo's performance and its crucial zero-shot generalization capability, making real-time FM deployment viable.
