Electric Arc Furnaces Scheduling under Electricity Price Volatility with Reinforcement Learning
Ruonan Pi, Zhiyuan Fan, Bolun Xu
TL;DR
<3-5 sentence high-level summary> The paper tackles the challenge of scheduling electric arc furnaces under volatile electricity prices by combining a rolling-horizon MILP baseline with a solver-free Q-learning policy that operates on day-ahead price signals. It introduces a two-stage EAF power model with start-up and stock dynamics, couples multiple furnaces under a feeder capacity constraint, and benchmarks performance using NYISO data. The key contributions are a tractable rolling-horizon MILP for interpretable upper bounds and a data-driven RL dispatcher that attains about 90% of the MILP profit in both single- and multi-unit settings, with significantly lower online computation and robust performance under unit heterogeneity. The results demonstrate effective coordination across furnaces, near-optimal profit, and actionable design insights for grid-responsive industrial loads and future extensions to uncertainty and full melt-shop coordination.
Abstract
This paper proposes a reinforcement learning-based framework for optimizing the operation of electric arc furnaces (EAFs) under volatile electricity prices. We formulate the deterministic version of the EAF scheduling problem into a mixed-integer linear programming (MILP) formulation, and then develop a Q-learning algorithm to perform real-time control of multiple EAF units under real-time price volatility and shared feeding capacity constraints. We design a custom reward function for the Q-learning algorithm to smooth the start-up penalties of the EAFs. Using real data from EAF designs and electricity prices in New York State, we benchmark our algorithm against a baseline rule-based controller and a MILP benchmark, assuming perfect price forecasts. The results show that our reinforcement learning algorithm achieves around 90% of the profit compared to the perfect MILP benchmark in various single-unit and multi-unit cases under a non-anticipatory control setting.
