Table of Contents
Fetching ...

Electric Arc Furnaces Scheduling under Electricity Price Volatility with Reinforcement Learning

Ruonan Pi, Zhiyuan Fan, Bolun Xu

TL;DR

<3-5 sentence high-level summary> The paper tackles the challenge of scheduling electric arc furnaces under volatile electricity prices by combining a rolling-horizon MILP baseline with a solver-free Q-learning policy that operates on day-ahead price signals. It introduces a two-stage EAF power model with start-up and stock dynamics, couples multiple furnaces under a feeder capacity constraint, and benchmarks performance using NYISO data. The key contributions are a tractable rolling-horizon MILP for interpretable upper bounds and a data-driven RL dispatcher that attains about 90% of the MILP profit in both single- and multi-unit settings, with significantly lower online computation and robust performance under unit heterogeneity. The results demonstrate effective coordination across furnaces, near-optimal profit, and actionable design insights for grid-responsive industrial loads and future extensions to uncertainty and full melt-shop coordination.

Abstract

This paper proposes a reinforcement learning-based framework for optimizing the operation of electric arc furnaces (EAFs) under volatile electricity prices. We formulate the deterministic version of the EAF scheduling problem into a mixed-integer linear programming (MILP) formulation, and then develop a Q-learning algorithm to perform real-time control of multiple EAF units under real-time price volatility and shared feeding capacity constraints. We design a custom reward function for the Q-learning algorithm to smooth the start-up penalties of the EAFs. Using real data from EAF designs and electricity prices in New York State, we benchmark our algorithm against a baseline rule-based controller and a MILP benchmark, assuming perfect price forecasts. The results show that our reinforcement learning algorithm achieves around 90% of the profit compared to the perfect MILP benchmark in various single-unit and multi-unit cases under a non-anticipatory control setting.

Electric Arc Furnaces Scheduling under Electricity Price Volatility with Reinforcement Learning

TL;DR

<3-5 sentence high-level summary> The paper tackles the challenge of scheduling electric arc furnaces under volatile electricity prices by combining a rolling-horizon MILP baseline with a solver-free Q-learning policy that operates on day-ahead price signals. It introduces a two-stage EAF power model with start-up and stock dynamics, couples multiple furnaces under a feeder capacity constraint, and benchmarks performance using NYISO data. The key contributions are a tractable rolling-horizon MILP for interpretable upper bounds and a data-driven RL dispatcher that attains about 90% of the MILP profit in both single- and multi-unit settings, with significantly lower online computation and robust performance under unit heterogeneity. The results demonstrate effective coordination across furnaces, near-optimal profit, and actionable design insights for grid-responsive industrial loads and future extensions to uncertainty and full melt-shop coordination.

Abstract

This paper proposes a reinforcement learning-based framework for optimizing the operation of electric arc furnaces (EAFs) under volatile electricity prices. We formulate the deterministic version of the EAF scheduling problem into a mixed-integer linear programming (MILP) formulation, and then develop a Q-learning algorithm to perform real-time control of multiple EAF units under real-time price volatility and shared feeding capacity constraints. We design a custom reward function for the Q-learning algorithm to smooth the start-up penalties of the EAFs. Using real data from EAF designs and electricity prices in New York State, we benchmark our algorithm against a baseline rule-based controller and a MILP benchmark, assuming perfect price forecasts. The results show that our reinforcement learning algorithm achieves around 90% of the profit compared to the perfect MILP benchmark in various single-unit and multi-unit cases under a non-anticipatory control setting.

Paper Structure

This paper contains 39 sections, 13 equations, 6 figures, 7 tables, 1 algorithm.

Figures (6)

  • Figure 1: Electric arc furnace (EAF) batch cycle. The physical process consists of charging, melting, slag removal, and tapping. For optimization, we aggregate these into a high-power melting stage and a base-power stage (charging/slag/tapping).
  • Figure 2: Simplified two-stage power profile for an EAF cycle.High-power (melting) alternates with base-power (charging/slag/tapping).
  • Figure 3: Two-stage abstraction of the EAF cycle. Schematic material dynamics over time. The melting throughput $k$ (blue) is activated during the high-power stage, while the production/tapping throughput $r$ (green) is scheduled when the furnace is on. Dashed vertical lines mark stage-switching instants; axis annotations indicate nominal durations/scales.
  • Figure 4: Cumulative profit under a coupled feeder capacity $P_{\max}$ for MILP (clairvoyant), Q-learning (no foresight), and a heuristic baseline.
  • Figure 5: Comparison of per-unit furnace power trajectories under (a) MILP, (b) Q-learning, and (c) baseline scheduling over a representative 200$\times$5-min interval. Panels (a)--(c) show stacked power by unit, and panel (d) shows the corresponding electricity price. The MILP and Q-learning policies largely concentrate melting in low-price intervals and reduce load during price spikes, whereas the baseline follows a rigid, price-agnostic pattern that often maintains high power during expensive periods.
  • ...and 1 more figures