Table of Contents
Fetching ...

Fine-Tuning LLMs to Generate Economical and Reliable Actions for the Power Grid

Mohamad Chehade, Hao Zhu

TL;DR

This work addresses the need for rapid, reliable corrective actions during PSPS events by adapting a strong instruction-tuned LLM into a verifiable switching assistant. It introduces a multi-stage pipeline: supervised fine-tuning to imitate a DC-OPF MILP oracle within an open-only, budget-constrained grammar, followed by direct preference optimization (DPO) using AC-derived voltage-quality preferences, and finally best-of-$N$ inference to select high-quality, feasible plans. Together, these stages yield improved DC objectives, substantially lower AC feasibility failures, and better voltage profiles on the IEEE 118-bus PSPS scenarios, with code and data-generation scripts released for reproducibility. The approach demonstrates how operator-facing LLMs can integrate with existing grid analysis tools to provide verifiable, voltage-aware switching recommendations under practical constraints.

Abstract

Public Safety Power Shutoffs (PSPS) force rapid topology changes that can render standard operating points infeasible, requiring operators to quickly identify corrective transmission switching actions that reduce load shedding while maintaining acceptable voltage behavior. We present a verifiable, multi-stage adaptation pipeline that fine-tunes an instruction-tuned large language model (LLM) to generate \emph{open-only} corrective switching plans from compact PSPS scenario summaries under an explicit switching budget. First, supervised fine-tuning distills a DC-OPF MILP oracle into a constrained action grammar that enables reliable parsing and feasibility checks. Second, direct preference optimization refines the policy using AC-evaluated preference pairs ranked by a voltage-penalty metric, injecting voltage-awareness beyond DC imitation. Finally, best-of-$N$ selection provides an inference-time addition by choosing the best feasible candidate under the target metric. On IEEE 118-bus PSPS scenarios, fine-tuning substantially improves DC objective values versus zero-shot generation, reduces AC power-flow failure from 50\% to single digits, and improves voltage-penalty outcomes on the common-success set. Code and data-generation scripts are released to support reproducibility.

Fine-Tuning LLMs to Generate Economical and Reliable Actions for the Power Grid

TL;DR

This work addresses the need for rapid, reliable corrective actions during PSPS events by adapting a strong instruction-tuned LLM into a verifiable switching assistant. It introduces a multi-stage pipeline: supervised fine-tuning to imitate a DC-OPF MILP oracle within an open-only, budget-constrained grammar, followed by direct preference optimization (DPO) using AC-derived voltage-quality preferences, and finally best-of- inference to select high-quality, feasible plans. Together, these stages yield improved DC objectives, substantially lower AC feasibility failures, and better voltage profiles on the IEEE 118-bus PSPS scenarios, with code and data-generation scripts released for reproducibility. The approach demonstrates how operator-facing LLMs can integrate with existing grid analysis tools to provide verifiable, voltage-aware switching recommendations under practical constraints.

Abstract

Public Safety Power Shutoffs (PSPS) force rapid topology changes that can render standard operating points infeasible, requiring operators to quickly identify corrective transmission switching actions that reduce load shedding while maintaining acceptable voltage behavior. We present a verifiable, multi-stage adaptation pipeline that fine-tunes an instruction-tuned large language model (LLM) to generate \emph{open-only} corrective switching plans from compact PSPS scenario summaries under an explicit switching budget. First, supervised fine-tuning distills a DC-OPF MILP oracle into a constrained action grammar that enables reliable parsing and feasibility checks. Second, direct preference optimization refines the policy using AC-evaluated preference pairs ranked by a voltage-penalty metric, injecting voltage-awareness beyond DC imitation. Finally, best-of- selection provides an inference-time addition by choosing the best feasible candidate under the target metric. On IEEE 118-bus PSPS scenarios, fine-tuning substantially improves DC objective values versus zero-shot generation, reduces AC power-flow failure from 50\% to single digits, and improves voltage-penalty outcomes on the common-success set. Code and data-generation scripts are released to support reproducibility.
Paper Structure (22 sections, 8 equations, 5 figures)

This paper contains 22 sections, 8 equations, 5 figures.

Figures (5)

  • Figure 1: Multi-stage adaptation pipeline for PSPS corrective switching.
  • Figure 2: Training curves from the fine-tuning jobs (SFT and DPO) show the convergence.
  • Figure 3: Distribution of DC objective $J_{\mathrm{DC}}$ across all four compared policies (zero-shot, SFT, DPO, and NN).
  • Figure 4: AC power-flow failure rate across compared policies. Fine-tuned policies drastically reduce AC failures relative to the zero-shot baseline.
  • Figure 5: Voltage penalty $V_{\mathrm{pen}}$ distribution on the common-success set, i.e., scenarios where all compared policies (SFT, DPO, NN) have achieved AC convergence. This ensures an apples-to-apples comparison of voltage quality.