Table of Contents
Fetching ...

Unlocking Multi-Task Electric Energy System Intelligence: Data Scaling Laws and Performance with Limited Fine-Tuning

Shaohuai Liu, Lin Dong, Chao Tian, Le Xie

TL;DR

This work addresses the challenge of creating power-system foundation models that generalize across tasks and unseen operational scenarios. It develops a data-centric approach, demonstrating that scenario-generalization performance scales approximately as a power law with the amount of fine-tuning data, and that multi-task training preserves gains with limited interference. The study further shows that small models can achieve strong results and that parameter scaling yields limited benefits in this domain, highlighting data quality and task design as primary drivers of performance. Overall, the findings suggest data-efficient pathways to deploy robust, multi-task, cross-timescale AI for power systems, even with synthetic data and single-topology focus.

Abstract

Data scaling has revolutionized research fields like natural language processing, computer vision, and robotics control, providing foundation models with remarkable multi-task and generalization capabilities. In this paper, we investigate whether similar data scaling laws exist in developing foundation models for power systems, and whether appropriate data scaling can yield multi-task, cross-timescales capabilities that can be deployed in \textit{unseen} operational scenarios. To this end, we conducted a comprehensive empirical study on data scaling by fine-tuning open-source foundation models using labeled data collected from diverse operational tasks and scenarios. We study how a foundation model's scenario generalization performance evolves with the number of training tasks, scenarios, and demonstrations. Our study involved collecting more than 450k demonstrations and implementing independent tests under a rigorous evaluation framework. Our findings reveal several key insights: First, the generalization performance of a fine-tuned foundation model follows an approximate power-law relationship with the number of demonstrations and scenarios. Second, the fine-tuned model also demonstrates impressive multi-task capabilities, where multi-task training shares similar performance improvements with single-task training as the number of demonstrations increases, without interference among tasks. Lastly, models with small parameter sizes could have strong performance as well. Model performance does not scale significantly with parameter size. These findings underscore the feasibility of developing multi-task foundation models tailored for power systems, demonstrating that while larger datasets and models generally improve performance, extreme scaling is unnecessary to achieve satisfactory outcomes.

Unlocking Multi-Task Electric Energy System Intelligence: Data Scaling Laws and Performance with Limited Fine-Tuning

TL;DR

This work addresses the challenge of creating power-system foundation models that generalize across tasks and unseen operational scenarios. It develops a data-centric approach, demonstrating that scenario-generalization performance scales approximately as a power law with the amount of fine-tuning data, and that multi-task training preserves gains with limited interference. The study further shows that small models can achieve strong results and that parameter scaling yields limited benefits in this domain, highlighting data quality and task design as primary drivers of performance. Overall, the findings suggest data-efficient pathways to deploy robust, multi-task, cross-timescale AI for power systems, even with synthetic data and single-topology focus.

Abstract

Data scaling has revolutionized research fields like natural language processing, computer vision, and robotics control, providing foundation models with remarkable multi-task and generalization capabilities. In this paper, we investigate whether similar data scaling laws exist in developing foundation models for power systems, and whether appropriate data scaling can yield multi-task, cross-timescales capabilities that can be deployed in \textit{unseen} operational scenarios. To this end, we conducted a comprehensive empirical study on data scaling by fine-tuning open-source foundation models using labeled data collected from diverse operational tasks and scenarios. We study how a foundation model's scenario generalization performance evolves with the number of training tasks, scenarios, and demonstrations. Our study involved collecting more than 450k demonstrations and implementing independent tests under a rigorous evaluation framework. Our findings reveal several key insights: First, the generalization performance of a fine-tuned foundation model follows an approximate power-law relationship with the number of demonstrations and scenarios. Second, the fine-tuned model also demonstrates impressive multi-task capabilities, where multi-task training shares similar performance improvements with single-task training as the number of demonstrations increases, without interference among tasks. Lastly, models with small parameter sizes could have strong performance as well. Model performance does not scale significantly with parameter size. These findings underscore the feasibility of developing multi-task foundation models tailored for power systems, demonstrating that while larger datasets and models generally improve performance, extreme scaling is unnecessary to achieve satisfactory outcomes.

Paper Structure

This paper contains 13 sections, 10 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Overview of the research. The training data used in the research were either collected from real world historical data or generated through well-recognized powers system simulation tools, such as Powerworld, Pypower, etc. A hybrid dataset covering typical power system operation tasks was then formed and converted into Question & Answer pairs. After specific transformation of float numbers in the question data, the model were fine-tuned to generate expected answers, which were finally converted back to appropriate form through the same transformation algorithm.
  • Figure 2: Single-task training vs. Three-task training. Orange lines represent curves under single-task training, while blue lines are under three-task training. The closer the two lines are, especially in the convergence stage, the stronger the multi-task ability, demonstrating the task scalability of foundation models in the power system.
  • Figure 3: Power-law relationship. Dashed lines represent power-law regressions, with fitted equations provided in the legend. All axes are presented in logarithmic scales. The correlation coefficient r indicates how strong a power-law holds between the scenario-generalization ability in each task and the training data size.
  • Figure 4: Verifications on three new tasks. To verify the effectiveness and scalability of our proposed method, we extend the number of simultaneous training tasks from 3 to 6. We only demonstrate the results of three new-added tasks since the model performs the same for the three previous tasks. As demonstrated in (A), the model still maintains the same multi-task performance compared to single-task training when the number of simultaneous training tasks increases. In addition, the model performance still follows the power-law relationship with the data size in the three new tasks, as depicted in (B).