Skilled AI Agents for Embedded and IoT Systems Development

Yiming Li; Yuhan Cheng; Mingchen Ma; Yihang Zou; Ningyuan Yang; Wei Cheng; Hai "Helen" Li; Yiran Chen; Tingjun Chen

Skilled AI Agents for Embedded and IoT Systems Development

Yiming Li, Yuhan Cheng, Mingchen Ma, Yihang Zou, Ningyuan Yang, Wei Cheng, Hai "Helen" Li, Yiran Chen, Tingjun Chen

Abstract

Large language models (LLMs) and agentic systems have shown promise for automated software development, but applying them to hardware-in-the-loop (HIL) embedded and Internet-of-Things (IoT) systems remains challenging due to the tight coupling between software logic and physical hardware behavior. Code that compiles successfully may still fail when deployed on real devices because of timing constraints, peripheral initialization requirements, or hardware-specific behaviors. To address this challenge, we introduce a skills-based agentic framework for HIL embedded development together with IoT-SkillsBench, a benchmark designed to systematically evaluate AI agents in real embedded programming environments. IoT-SkillsBench spans three representative embedded platforms, 23 peripherals, and 42 tasks across three difficulty levels, where each task is evaluated under three agent configurations (no-skills, LLM-generated skills, and human-expert skills) and validated through real hardware execution. Across 378 hardware validated experiments, we show that concise human-expert skills with structured expert knowledge enable near-perfect success rates across platforms.

Skilled AI Agents for Embedded and IoT Systems Development

Abstract

Paper Structure (13 sections, 3 figures, 3 tables)

This paper contains 13 sections, 3 figures, 3 tables.

Introduction
IoT-SkillsBench: Platforms, Tasks, and Skills
Agent Setup and Evaluation Protocol
Experimental Results
Conclusions
Acknowledgments
List of Peripherals
Prompt for LLM-based Skills Generation
Example Tasks Across Difficulty Levels
Level 1: Basic Peripheral Control
Level 2: Protocol-Level Communication
Level 3: System-Level Integration
Human-Expert Skills Examples

Figures (3)

Figure 1: Overview of the skills-based agentic framework for embedded and IoT systems development and the IoT-SkillsBench benchmark.
Figure 2: Pass rates comparison (Pass@1 and Pass@5) across platforms, skills configurations, and task difficulty levels.
Figure 3: Average per-task token usage across platforms, skills configurations, and task difficulty levels.

Skilled AI Agents for Embedded and IoT Systems Development

Abstract

Skilled AI Agents for Embedded and IoT Systems Development

Authors

Abstract

Table of Contents

Figures (3)