Fine-Tuning LLMs to Generate Economical and Reliable Actions for the Power Grid
Mohamad Chehade, Hao Zhu
TL;DR
This work addresses the need for rapid, reliable corrective actions during PSPS events by adapting a strong instruction-tuned LLM into a verifiable switching assistant. It introduces a multi-stage pipeline: supervised fine-tuning to imitate a DC-OPF MILP oracle within an open-only, budget-constrained grammar, followed by direct preference optimization (DPO) using AC-derived voltage-quality preferences, and finally best-of-$N$ inference to select high-quality, feasible plans. Together, these stages yield improved DC objectives, substantially lower AC feasibility failures, and better voltage profiles on the IEEE 118-bus PSPS scenarios, with code and data-generation scripts released for reproducibility. The approach demonstrates how operator-facing LLMs can integrate with existing grid analysis tools to provide verifiable, voltage-aware switching recommendations under practical constraints.
Abstract
Public Safety Power Shutoffs (PSPS) force rapid topology changes that can render standard operating points infeasible, requiring operators to quickly identify corrective transmission switching actions that reduce load shedding while maintaining acceptable voltage behavior. We present a verifiable, multi-stage adaptation pipeline that fine-tunes an instruction-tuned large language model (LLM) to generate \emph{open-only} corrective switching plans from compact PSPS scenario summaries under an explicit switching budget. First, supervised fine-tuning distills a DC-OPF MILP oracle into a constrained action grammar that enables reliable parsing and feasibility checks. Second, direct preference optimization refines the policy using AC-evaluated preference pairs ranked by a voltage-penalty metric, injecting voltage-awareness beyond DC imitation. Finally, best-of-$N$ selection provides an inference-time addition by choosing the best feasible candidate under the target metric. On IEEE 118-bus PSPS scenarios, fine-tuning substantially improves DC objective values versus zero-shot generation, reduces AC power-flow failure from 50\% to single digits, and improves voltage-penalty outcomes on the common-success set. Code and data-generation scripts are released to support reproducibility.
