Knowledge Distillation from Large Language Models for Household Energy Modeling
Mohannad Takrouri, Nicolás M. Cuadrado, Martin Takáč
TL;DR
This work tackles data scarcity in ML-based household energy modeling by leveraging large language models to synthesize culturally nuanced, climate-aware household data across six countries. It introduces a four-stage framework—family-structure generation, weather-range definition, detailed weather data generation, and consumption-pattern modeling—and evaluates five LLMs, with an optional external weather data path to ensure physical consistency. Results show that LLMs can produce realistic, context-rich datasets, though model choice and data source impact realism and reliability; external weather data often yields smoother, more plausible inputs where available. The framework offers a scalable, privacy-preserving approach for scenario-based energy optimization, policy analysis, and urban planning across diverse cultural and climatic contexts.
Abstract
Machine learning (ML) is increasingly vital for smart-grid research, yet restricted access to realistic, diverse data - often due to privacy concerns - slows progress and fuels doubts within the energy sector about adopting ML-based strategies. We propose integrating Large Language Models (LLMs) in energy modeling to generate realistic, culturally sensitive, and behavior-specific data for household energy usage across diverse geographies. In this study, we employ and compare five different LLMs to systematically produce family structures, weather patterns, and daily consumption profiles for households in six distinct countries. A four-stage methodology synthesizes contextual daily data, including culturally nuanced activities, realistic weather ranges, HVAC operations, and distinct `energy signatures' that capture unique consumption footprints. Additionally, we explore an alternative strategy where external weather datasets can be directly integrated, bypassing intermediate weather modeling stages while ensuring physically consistent data inputs. The resulting dataset provides insights into how cultural, climatic, and behavioral factors converge to shape carbon emissions, offering a cost-effective avenue for scenario-based energy optimization. This approach underscores how prompt engineering, combined with knowledge distillation, can advance sustainable energy research and climate mitigation efforts. Source code is available at https://github.com/Singularity-AI-Lab/LLM-Energy-Knowledge-Distillation .
