Agentic AI Home Energy Management System: A Large Language Model Framework for Residential Load Scheduling
Reda El Makroum, Sebastian Zwickl-Bernhard, Lukas Kranzl
TL;DR
The paper addresses reducing user interaction barriers in residential HEMS by introducing an agentic AI HEMS where a large language model autonomously coordinates end-to-end multi-appliance load scheduling from natural language input to device control. It introduces a hierarchical architecture with one orchestrator and three specialist agents using the ReAct pattern, integrates real-time price data and calendar-derived deadlines, and benchmarks against MILP ground truth using real Austrian day-ahead prices. Across three open-source model backends, Llama-3.3-70B achieves 100% MILP-optimal multi-appliance scheduling, while Qwen-3-32B and GPT-OSS-120B fail to coordinate all three appliances, highlighting scale-related reasoning challenges. The study also shows analytical query handling requires explicit workflow guidance, emphasizes open-source reproducibility, and discusses trade-offs in model choice, prompt engineering, security, and sustainability for practical deployment. MILP objective: min_{t_{WM}, t_{DW}, t_{EV}} sum_{a in A} sum_{k=0}^{d_a-1} C_{t_a+k}, and the system demonstrates feasibility and boundaries for agentic AI HEMS in residential demand response under real-world pricing.
Abstract
The electricity sector transition requires substantial increases in residential demand response capacity, yet Home Energy Management Systems (HEMS) adoption remains limited by user interaction barriers requiring translation of everyday preferences into technical parameters. While large language models have been applied to energy systems as code generators and parameter extractors, no existing implementation deploys LLMs as autonomous coordinators managing the complete workflow from natural language input to multi-appliance scheduling. This paper presents an agentic AI HEMS where LLMs autonomously coordinate multi-appliance scheduling from natural language requests to device control, achieving optimal scheduling without example demonstrations. A hierarchical architecture combining one orchestrator with three specialist agents uses the ReAct pattern for iterative reasoning, enabling dynamic coordination without hardcoded workflows while integrating Google Calendar for context-aware deadline extraction. Evaluation across three open-source models using real Austrian day-ahead electricity prices reveals substantial capability differences. Llama-3.3-70B successfully coordinates all appliances across all scenarios to match cost-optimal benchmarks computed via mixed-integer linear programming, while other models achieve perfect single-appliance performance but struggle to coordinate all appliances simultaneously. Progressive prompt engineering experiments demonstrate that analytical query handling without explicit guidance remains unreliable despite models' general reasoning capabilities. We open-source the complete system including orchestration logic, agent prompts, tools, and web interfaces to enable reproducibility, extension, and future research.
