Table of Contents
Fetching ...

Agentic AI Home Energy Management System: A Large Language Model Framework for Residential Load Scheduling

Reda El Makroum, Sebastian Zwickl-Bernhard, Lukas Kranzl

TL;DR

The paper addresses reducing user interaction barriers in residential HEMS by introducing an agentic AI HEMS where a large language model autonomously coordinates end-to-end multi-appliance load scheduling from natural language input to device control. It introduces a hierarchical architecture with one orchestrator and three specialist agents using the ReAct pattern, integrates real-time price data and calendar-derived deadlines, and benchmarks against MILP ground truth using real Austrian day-ahead prices. Across three open-source model backends, Llama-3.3-70B achieves 100% MILP-optimal multi-appliance scheduling, while Qwen-3-32B and GPT-OSS-120B fail to coordinate all three appliances, highlighting scale-related reasoning challenges. The study also shows analytical query handling requires explicit workflow guidance, emphasizes open-source reproducibility, and discusses trade-offs in model choice, prompt engineering, security, and sustainability for practical deployment. MILP objective: min_{t_{WM}, t_{DW}, t_{EV}} sum_{a in A} sum_{k=0}^{d_a-1} C_{t_a+k}, and the system demonstrates feasibility and boundaries for agentic AI HEMS in residential demand response under real-world pricing.

Abstract

The electricity sector transition requires substantial increases in residential demand response capacity, yet Home Energy Management Systems (HEMS) adoption remains limited by user interaction barriers requiring translation of everyday preferences into technical parameters. While large language models have been applied to energy systems as code generators and parameter extractors, no existing implementation deploys LLMs as autonomous coordinators managing the complete workflow from natural language input to multi-appliance scheduling. This paper presents an agentic AI HEMS where LLMs autonomously coordinate multi-appliance scheduling from natural language requests to device control, achieving optimal scheduling without example demonstrations. A hierarchical architecture combining one orchestrator with three specialist agents uses the ReAct pattern for iterative reasoning, enabling dynamic coordination without hardcoded workflows while integrating Google Calendar for context-aware deadline extraction. Evaluation across three open-source models using real Austrian day-ahead electricity prices reveals substantial capability differences. Llama-3.3-70B successfully coordinates all appliances across all scenarios to match cost-optimal benchmarks computed via mixed-integer linear programming, while other models achieve perfect single-appliance performance but struggle to coordinate all appliances simultaneously. Progressive prompt engineering experiments demonstrate that analytical query handling without explicit guidance remains unreliable despite models' general reasoning capabilities. We open-source the complete system including orchestration logic, agent prompts, tools, and web interfaces to enable reproducibility, extension, and future research.

Agentic AI Home Energy Management System: A Large Language Model Framework for Residential Load Scheduling

TL;DR

The paper addresses reducing user interaction barriers in residential HEMS by introducing an agentic AI HEMS where a large language model autonomously coordinates end-to-end multi-appliance load scheduling from natural language input to device control. It introduces a hierarchical architecture with one orchestrator and three specialist agents using the ReAct pattern, integrates real-time price data and calendar-derived deadlines, and benchmarks against MILP ground truth using real Austrian day-ahead prices. Across three open-source model backends, Llama-3.3-70B achieves 100% MILP-optimal multi-appliance scheduling, while Qwen-3-32B and GPT-OSS-120B fail to coordinate all three appliances, highlighting scale-related reasoning challenges. The study also shows analytical query handling requires explicit workflow guidance, emphasizes open-source reproducibility, and discusses trade-offs in model choice, prompt engineering, security, and sustainability for practical deployment. MILP objective: min_{t_{WM}, t_{DW}, t_{EV}} sum_{a in A} sum_{k=0}^{d_a-1} C_{t_a+k}, and the system demonstrates feasibility and boundaries for agentic AI HEMS in residential demand response under real-world pricing.

Abstract

The electricity sector transition requires substantial increases in residential demand response capacity, yet Home Energy Management Systems (HEMS) adoption remains limited by user interaction barriers requiring translation of everyday preferences into technical parameters. While large language models have been applied to energy systems as code generators and parameter extractors, no existing implementation deploys LLMs as autonomous coordinators managing the complete workflow from natural language input to multi-appliance scheduling. This paper presents an agentic AI HEMS where LLMs autonomously coordinate multi-appliance scheduling from natural language requests to device control, achieving optimal scheduling without example demonstrations. A hierarchical architecture combining one orchestrator with three specialist agents uses the ReAct pattern for iterative reasoning, enabling dynamic coordination without hardcoded workflows while integrating Google Calendar for context-aware deadline extraction. Evaluation across three open-source models using real Austrian day-ahead electricity prices reveals substantial capability differences. Llama-3.3-70B successfully coordinates all appliances across all scenarios to match cost-optimal benchmarks computed via mixed-integer linear programming, while other models achieve perfect single-appliance performance but struggle to coordinate all appliances simultaneously. Progressive prompt engineering experiments demonstrate that analytical query handling without explicit guidance remains unreliable despite models' general reasoning capabilities. We open-source the complete system including orchestration logic, agent prompts, tools, and web interfaces to enable reproducibility, extension, and future research.

Paper Structure

This paper contains 32 sections, 5 equations, 9 figures, 4 tables, 3 algorithms.

Figures (9)

  • Figure 1: Agentic AI HEMS Architecture: A central orchestrator agent coordinates three specialist load agents using the ReAct pattern, leveraging external APIs for price and calendar data, and committing optimized schedules to smart home devices.
  • Figure 2: MILP-optimal multi-appliance scheduling for 15 October 2025 using real Austrian day-ahead prices. The red shaded region indicates the most expensive 3-hour slot (6:30-9:30 AM), which serves as the validation benchmark for the analytical query evaluation. All three appliances are scheduled during the low-price overnight period, with the EV charger starting at 00:15, washing machine at 02:30, and dishwasher at 02:45, minimizing total electricity cost by avoiding the high-price morning peak.
  • Figure 3: Single-appliance scheduling performance metrics across three models. All models achieve 100% optimality with similar computational requirements. Llama-3.3 demonstrates the most efficient resource usage (13,122 tokens, 4.8s), while GPT-OSS requires moderately higher resources (15,792 tokens, 8.0s). Error bars represent standard deviation across five independent runs.
  • Figure 4: Multi-appliance scheduling performance metrics for Llama-3.3, the only model achieving 100% success across all scenarios. Token consumption increases by a factor of 2.5 compared to single-appliance coordination (32,883 vs 13,122 tokens), with execution time scaling proportionally to 14.7 seconds. Error bars represent standard deviation across five independent runs.
  • Figure 5: Model performance comparison across single and multi-appliance scheduling scenarios. All three models achieve 100% success for single-appliance coordination, but only Llama-3.3 maintains this performance in multi-appliance contexts. Qwen-3 successfully schedules washing machine and dishwasher but fails EV coordination, while GPT-OSS only attempts washing machine scheduling.
  • ...and 4 more figures