Table of Contents
Fetching ...

Knowledge Distillation from Large Language Models for Household Energy Modeling

Mohannad Takrouri, Nicolás M. Cuadrado, Martin Takáč

TL;DR

This work tackles data scarcity in ML-based household energy modeling by leveraging large language models to synthesize culturally nuanced, climate-aware household data across six countries. It introduces a four-stage framework—family-structure generation, weather-range definition, detailed weather data generation, and consumption-pattern modeling—and evaluates five LLMs, with an optional external weather data path to ensure physical consistency. Results show that LLMs can produce realistic, context-rich datasets, though model choice and data source impact realism and reliability; external weather data often yields smoother, more plausible inputs where available. The framework offers a scalable, privacy-preserving approach for scenario-based energy optimization, policy analysis, and urban planning across diverse cultural and climatic contexts.

Abstract

Machine learning (ML) is increasingly vital for smart-grid research, yet restricted access to realistic, diverse data - often due to privacy concerns - slows progress and fuels doubts within the energy sector about adopting ML-based strategies. We propose integrating Large Language Models (LLMs) in energy modeling to generate realistic, culturally sensitive, and behavior-specific data for household energy usage across diverse geographies. In this study, we employ and compare five different LLMs to systematically produce family structures, weather patterns, and daily consumption profiles for households in six distinct countries. A four-stage methodology synthesizes contextual daily data, including culturally nuanced activities, realistic weather ranges, HVAC operations, and distinct `energy signatures' that capture unique consumption footprints. Additionally, we explore an alternative strategy where external weather datasets can be directly integrated, bypassing intermediate weather modeling stages while ensuring physically consistent data inputs. The resulting dataset provides insights into how cultural, climatic, and behavioral factors converge to shape carbon emissions, offering a cost-effective avenue for scenario-based energy optimization. This approach underscores how prompt engineering, combined with knowledge distillation, can advance sustainable energy research and climate mitigation efforts. Source code is available at https://github.com/Singularity-AI-Lab/LLM-Energy-Knowledge-Distillation .

Knowledge Distillation from Large Language Models for Household Energy Modeling

TL;DR

This work tackles data scarcity in ML-based household energy modeling by leveraging large language models to synthesize culturally nuanced, climate-aware household data across six countries. It introduces a four-stage framework—family-structure generation, weather-range definition, detailed weather data generation, and consumption-pattern modeling—and evaluates five LLMs, with an optional external weather data path to ensure physical consistency. Results show that LLMs can produce realistic, context-rich datasets, though model choice and data source impact realism and reliability; external weather data often yields smoother, more plausible inputs where available. The framework offers a scalable, privacy-preserving approach for scenario-based energy optimization, policy analysis, and urban planning across diverse cultural and climatic contexts.

Abstract

Machine learning (ML) is increasingly vital for smart-grid research, yet restricted access to realistic, diverse data - often due to privacy concerns - slows progress and fuels doubts within the energy sector about adopting ML-based strategies. We propose integrating Large Language Models (LLMs) in energy modeling to generate realistic, culturally sensitive, and behavior-specific data for household energy usage across diverse geographies. In this study, we employ and compare five different LLMs to systematically produce family structures, weather patterns, and daily consumption profiles for households in six distinct countries. A four-stage methodology synthesizes contextual daily data, including culturally nuanced activities, realistic weather ranges, HVAC operations, and distinct `energy signatures' that capture unique consumption footprints. Additionally, we explore an alternative strategy where external weather datasets can be directly integrated, bypassing intermediate weather modeling stages while ensuring physically consistent data inputs. The resulting dataset provides insights into how cultural, climatic, and behavioral factors converge to shape carbon emissions, offering a cost-effective avenue for scenario-based energy optimization. This approach underscores how prompt engineering, combined with knowledge distillation, can advance sustainable energy research and climate mitigation efforts. Source code is available at https://github.com/Singularity-AI-Lab/LLM-Energy-Knowledge-Distillation .

Paper Structure

This paper contains 26 sections, 11 figures, 7 tables.

Figures (11)

  • Figure 1: Overview of our framework with detail on the prompting strategy for each stage.
  • Figure 2: Hourly energy consumption patterns for a single-parent family in the USA across (a) Autumn and (b) Spring, showing weekday and weekend patterns and using the weather data from Stage 2 and Stage 3, separated by a dashed line to highlight differences in energy consumption for all family members and HVAC actions.
  • Figure 3: Seasonal weather parameter ranges across the 6 selected countries which have been generated in Stage 2 using the DeepSeek-R1 model.
  • Figure 4: Hourly weather values for UAE (Summer) using external TMY data via pvlib.
  • Figure 5: Hourly weather values for UAE (Summer) generated by DeepSeek-R1. While occasionally incomplete, the model’s richer reasoning can pinpoint anomalies or inconsistencies.
  • ...and 6 more figures