Evolving LLM-Derived Control Policies for Residential EV Charging and Vehicle-to-Grid Energy Optimization

Vishesh Purnananda; Benjamin John Wruck; Mingyu Guo

Evolving LLM-Derived Control Policies for Residential EV Charging and Vehicle-to-Grid Energy Optimization

Vishesh Purnananda, Benjamin John Wruck, Mingyu Guo

TL;DR

This paper tackles the interpretability barrier of data-driven V2G control by using Large Language Models to synthesize explicit, auditable Python policies. These policies are evolved through a six-stage, simulation-driven loop in the EV2Gym-Residential environment, enabling profit optimization while respecting SoC and safety constraints. The authors compare four prompting strategies and show that a Hybrid approach yields 118% of a human baseline’s profit with concise, readable code, while a Runtime LLM agent achieves up to 190% but with substantially higher cost and latency. The work demonstrates that code-as-policies, grounded in high-fidelity simulation and regulatory-minded guardrails, can deliver transparent, deployable residential V2G controllers with practical impact for grid reliability and consumer trust.

Abstract

This research presents a novel application of Evolutionary Computation to the domain of residential electric vehicle (EV) energy management. While reinforcement learning (RL) achieves high performance in vehicle-to-grid (V2G) optimization, it typically produces opaque "black-box" neural networks that are difficult for consumers and regulators to audit. Addressing this interpretability gap, we propose a program search framework that leverages Large Language Models (LLMs) as intelligent mutation operators within an iterative prompt-evaluation-repair loop. Utilizing the high-fidelity EV2Gym simulation environment as a fitness function, the system undergoes successive refinement cycles to synthesize executable Python policies that balance profit maximization, user comfort, and physical safety constraints. We benchmark four prompting strategies: Imitation, Reasoning, Hybrid and Runtime, evaluating their ability to discover adaptive control logic. Results demonstrate that the Hybrid strategy produces concise, human-readable heuristics that achieve 118% of the baseline profit, effectively discovering complex behaviors like anticipatory arbitrage and hysteresis without explicit programming. This work establishes LLM-driven Evolutionary Computation as a practical approach for generating EV charging control policies that are transparent, inspectable, and suitable for real residential deployment.

Evolving LLM-Derived Control Policies for Residential EV Charging and Vehicle-to-Grid Energy Optimization

TL;DR

Abstract

Paper Structure (37 sections, 6 figures, 2 tables)

This paper contains 37 sections, 6 figures, 2 tables.

Introduction
Problem Motivation and The Interpretability Gap
LLM-driven Policy Synthesis
Contributions and Research Question
Related Work
Large Language Models as Evolutionary Search Agents
Programming by Example (PBE) and LLM Grounding
Challenges in V2G Reinforcement Learning
Automated Optimization Modeling
Interpretable and Symbolic Control
Additional Background for V2G Program Setting
Strategic V2G Control and Battery Longevity
Standardized Benchmarking with EV2Gym
Regulatory Audits and Explainable Policies
Methodology
...and 22 more sections

Figures (6)

Figure 1: LLM policy-generation and evaluation loop. The six stages form a closed language--simulation optimisation loop.
Figure 2: The text-based prompt structure used for the Hybrid strategy. It establishes the observation schema, enforces the $\pm$7 kW charger constraints verbally, and provides exemplar state-action pairs and iterative feedback to guide the model's reasoning.
Figure 3: Baseline Heuristic Controller performance over 1500 steps. Note the high-frequency switching between charging (red) and discharging (green) and the jagged SoC curve (blue), indicating reactive behavior.
Figure 4: Example evolved policy. Note the explicit Python thresholds for price, SoC, and TTD, which provide a fully auditable alternative to neural networks xie2025reinforcementromera2024mathematical.
Figure 5: Reward evolution and behavioural fit across 10 iterations of the Hybrid strategy. Note that maximum profit occurs when fit to the baseline slightly decreases, indicating the discovery of novel arbitrage logic.
...and 1 more figures

Evolving LLM-Derived Control Policies for Residential EV Charging and Vehicle-to-Grid Energy Optimization

TL;DR

Abstract

Evolving LLM-Derived Control Policies for Residential EV Charging and Vehicle-to-Grid Energy Optimization

Authors

TL;DR

Abstract

Table of Contents

Figures (6)