GPIoT: Tailoring Small Language Models for IoT Program Synthesis and Development
Leming Shen, Qiang Yang, Xinyu Huang, Zijing Ma, Yuanqing Zheng
TL;DR
GPIoT tackles IoT-specific code synthesis by finely tuning three local small language models (TDSLM, RTSLM, CGSLM) on IoT-focused datasets, enabling privacy-preserving, end-to-end IoT program generation without cloud LLM dependencies. It introduces IoT-oriented data augmentation and a Parameter-Efficient Co-Tuning (PECT) framework with a multi-path LoRA pipeline and projection layers to align task decomposition and code generation. A dedicated IoTBench benchmark and extensive experiments across Heartbeat Detection, HAR, and multimodal HAR show substantial improvements in task accuracy ($+$64.7%), memory efficiency (up to $<$310 MB), and user satisfaction compared with state-of-the-art baselines. The work demonstrates the practical viability of IoT-tailored local SLMs for program synthesis, with strong potential for privacy-preserving, resource-aware AI development in IoT settings.
Abstract
Code Large Language Models (LLMs) enhance software development efficiency by automatically generating code and documentation in response to user requirements. However, code LLMs cannot synthesize specialized programs when tasked with IoT applications that require domain knowledge. While Retrieval-Augmented Generation (RAG) offers a promising solution by fetching relevant domain knowledge, it necessitates powerful cloud LLMs (e.g., GPT-4) to process user requirements and retrieved contents, which raises significant privacy concerns. This approach also suffers from unstable networks and prohibitive LLM query costs. Moreover, it is challenging to ensure the correctness and relevance of the fetched contents. To address these issues, we propose GPIoT, a code generation system for IoT applications by fine-tuning locally deployable Small Language Models (SLMs) on IoT-specialized datasets. SLMs have smaller model sizes, allowing efficient local deployment and execution to mitigate privacy concerns and network uncertainty. Furthermore, by fine-tuning the SLMs with our IoT-specialized datasets, the SLMs' ability to synthesize IoT-related programs can be substantially improved. To evaluate GPIoT's capability in synthesizing programs for IoT applications, we develop a benchmark, IoTBench. Extensive experiments and user trials demonstrate the effectiveness of GPIoT in generating IoT-specialized code, outperforming state-of-the-art code LLMs with an average task accuracy increment of 64.7% and significant improvements in user satisfaction.
