Table of Contents
Fetching ...

M$^2$OE$^2$-GL: A Family of Probabilistic Load Forecasters That Scales to Massive Customers

Haoran Li, Zhe Cheng, Muhao Guo, Yang Weng, Yannan Sun, Victor Tran, John Chainaranont

TL;DR

The paper tackles scalable probabilistic load forecasting for thousands of heterogeneous customer groups by extending the M$^2$OE$^2$ backbone to a global-to-local framework. It pretrains a single global model $f_{oldsymbol{ heta_0}}$ on all data and uses lightweight LoRA-based adapters $oldsymbol{}_g$ to tailor output heads, forming a family of forecasts $f_{oldsymbol{ heta_0},oldsymbol{}_g}$. This approach maintains a compact per-group footprint while achieving substantial accuracy gains, demonstrated by 30–50% improvements over the base model on real feeder data. The method offers practical deployment advantages for utilities by enabling scalable, uncertainty-aware forecasts across massive numbers of loads with reduced storage and compute requirements.

Abstract

Probabilistic load forecasting is widely studied and underpins power system planning, operation, and risk-aware decision making. Deep learning forecasters have shown strong ability to capture complex temporal and contextual patterns, achieving substantial accuracy gains. However, at the scale of thousands or even hundreds of thousands of loads in large distribution feeders, a deployment dilemma emerges: training and maintaining one model per customer is computationally and storage intensive, while using a single global model ignores distributional shifts across customer types, locations, and phases. Prior work typically focuses on single-load forecasters, global models across multiple loads, or adaptive/personalized models for relatively small settings, and rarely addresses the combined challenges of heterogeneity and scalability in large feeders. We propose M2OE2-GL, a global-to-local extension of the M2OE2 probabilistic forecaster. We first pretrain a single global M2OE2 base model across all feeder loads, then apply lightweight fine-tuning to derive a compact family of group-specific forecasters. Evaluated on realistic utility data, M2OE2-GL yields substantial error reductions while remaining scalable to very large numbers of loads.

M$^2$OE$^2$-GL: A Family of Probabilistic Load Forecasters That Scales to Massive Customers

TL;DR

The paper tackles scalable probabilistic load forecasting for thousands of heterogeneous customer groups by extending the MOE backbone to a global-to-local framework. It pretrains a single global model on all data and uses lightweight LoRA-based adapters to tailor output heads, forming a family of forecasts . This approach maintains a compact per-group footprint while achieving substantial accuracy gains, demonstrated by 30–50% improvements over the base model on real feeder data. The method offers practical deployment advantages for utilities by enabling scalable, uncertainty-aware forecasts across massive numbers of loads with reduced storage and compute requirements.

Abstract

Probabilistic load forecasting is widely studied and underpins power system planning, operation, and risk-aware decision making. Deep learning forecasters have shown strong ability to capture complex temporal and contextual patterns, achieving substantial accuracy gains. However, at the scale of thousands or even hundreds of thousands of loads in large distribution feeders, a deployment dilemma emerges: training and maintaining one model per customer is computationally and storage intensive, while using a single global model ignores distributional shifts across customer types, locations, and phases. Prior work typically focuses on single-load forecasters, global models across multiple loads, or adaptive/personalized models for relatively small settings, and rarely addresses the combined challenges of heterogeneity and scalability in large feeders. We propose M2OE2-GL, a global-to-local extension of the M2OE2 probabilistic forecaster. We first pretrain a single global M2OE2 base model across all feeder loads, then apply lightweight fine-tuning to derive a compact family of group-specific forecasters. Evaluated on realistic utility data, M2OE2-GL yields substantial error reductions while remaining scalable to very large numbers of loads.

Paper Structure

This paper contains 15 sections, 11 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Transformer loads from our industrial partner: Residential (green), small commercial (orange and red), and large commercial (blue). The left figure is the raw time series in a year, and the middle and right figures are the 2-D compressed data of a day's time series using PCA and t-SNE, respectively.
  • Figure 2: Load forecasting results for M$^2$OE$^2$ and baseline models on two test trajectories. The plots compare the ground truth load (blue line) against the mean prediction of our fine-tuned M$^2$OE$^2$-GL (Ours, red line), our pre-trained M$^2$OE$^2$-base (red dotted line), and other baseline models (RNN, LSTM, CNNGRU). The shaded red area represents the $\pm 1\sigma$ uncertainty bounds predicted only by the M$^2$OE$^2$-GL.
  • Figure 3: The figure conducts a sensitivity test on the LoRA rank(r) on fine-tuning performance. It compares LoRA models with r varying from 1 to 10 against a full-rank ('Full') fine-tuning model based on our two metrics.