Table of Contents
Fetching ...

Learn to Bid as a Price-Maker Wind Power Producer

Shobhit Singhal, Marta Fochesato, Liviu Aolaritei, Florian Dörfler

TL;DR

The paper tackles revenue optimization for price-maker wind power producers in short-term markets, where bids influence both dispatch and prices. It introduces a delayed-feedback Lipschitz contextual multi-armed bandit approach that uses contextual forecasts to learn bidding decisions, with a formal regret bound and a practical market-simulation framework built on Nord Pool and ENTSO-E data. Key contributions include reformulating the price-maker bidding problem as a context-dependent stochastic program suitable for CMAB, adapting the algorithm to delayed feedback, and demonstrating empirical gains over benchmarks in a German market setting. The work offers a practical, scalable method for WPPs to exploit price-maker effects and informs market participants on how contextual information can improve bidding performance in real-time and day-ahead markets.

Abstract

Wind power producers (WPPs) participating in short-term power markets face significant imbalance costs due to their non-dispatchable and variable production. While some WPPs have a large enough market share to influence prices with their bidding decisions, existing optimal bidding methods rarely account for this aspect. Price-maker approaches typically model bidding as a bilevel optimization problem, but these methods require complex market models, estimating other participants' actions, and are computationally demanding. To address these challenges, we propose an online learning algorithm that leverages contextual information to optimize WPP bids in the price-maker setting. We formulate the strategic bidding problem as a contextual multi-armed bandit, ensuring provable regret minimization. The algorithm's performance is evaluated against various benchmark strategies using a numerical simulation of the German day-ahead and real-time markets.

Learn to Bid as a Price-Maker Wind Power Producer

TL;DR

The paper tackles revenue optimization for price-maker wind power producers in short-term markets, where bids influence both dispatch and prices. It introduces a delayed-feedback Lipschitz contextual multi-armed bandit approach that uses contextual forecasts to learn bidding decisions, with a formal regret bound and a practical market-simulation framework built on Nord Pool and ENTSO-E data. Key contributions include reformulating the price-maker bidding problem as a context-dependent stochastic program suitable for CMAB, adapting the algorithm to delayed feedback, and demonstrating empirical gains over benchmarks in a German market setting. The work offers a practical, scalable method for WPPs to exploit price-maker effects and informs market participants on how contextual information can improve bidding performance in real-time and day-ahead markets.

Abstract

Wind power producers (WPPs) participating in short-term power markets face significant imbalance costs due to their non-dispatchable and variable production. While some WPPs have a large enough market share to influence prices with their bidding decisions, existing optimal bidding methods rarely account for this aspect. Price-maker approaches typically model bidding as a bilevel optimization problem, but these methods require complex market models, estimating other participants' actions, and are computationally demanding. To address these challenges, we propose an online learning algorithm that leverages contextual information to optimize WPP bids in the price-maker setting. We formulate the strategic bidding problem as a contextual multi-armed bandit, ensuring provable regret minimization. The algorithm's performance is evaluated against various benchmark strategies using a numerical simulation of the German day-ahead and real-time markets.

Paper Structure

This paper contains 15 sections, 1 theorem, 29 equations, 10 figures, 1 table, 2 algorithms.

Key Result

Theorem 1

Consider the CMAB problem with stochastic payoffs and delayed feedback. Algorithm alg:cbanditpseudodelay achieves vanishing average regret where $W$ is the maximum delay (or batch size), and $d_c$ is the r-zooming dimension.

Figures (10)

  • Figure 1: A price-maker WPP participating in the day-ahead and real-time markets. The day-ahead market clearing produces a dispatch schedule $p^{w}_{}$ and the resulting imbalance $(g_{}^{w}-p^{w}_{})$ is settled in the real-time market, where $g_{}^{w}$ denotes the realized WPP generation. $\lambda^{S}_{}, \lambda^{I}_{}$ denote the day-ahead and real-time market prices, respectively. In the price-maker setting, the day-ahead bid affects both the dispatch volume and the clearing price. Likewise, the day-ahead dispatch affects the imbalance volume, and thus, the real-time market price.
  • Figure 2: Potential improvement in WPP revenue by incorporating contextual information into the bidding strategy, compared to a context blind approach, for the proposed algorithm in Section \ref{['sec:algorithm']}. The results are based on historical German market data, with the simulation details provided in Section \ref{['sec:simsetup']}.
  • Figure 3: Schematic \ref{['fig:model_bilevel']} refers to the bilevel formulation \ref{['eq:toy_bilevel']}, where the upper-level optimizes the WPP's revenue, and the lower-level represents the day-ahead and real-time markets clearing \ref{['eq:daclearing']},\ref{['eq:realtimeclearing']}. The lower-level receives full information about market and wind power generation with the WPP's bid, and returns the market and generation outcome. Schematic \ref{['fig:model_sp']} refers to the stochastic program with decision-dependent uncertainty formulation \ref{['eq:spddu']}, where the WPP optimizes the expected revenue distributed as a parametric distribution in the WPP's bid and observed context.
  • Figure 4: Illustration of Algorithm \ref{['alg:cbanditpseudodelay']} in a two-dimensional bid-context space. Circles represent balls, with lighter shades indicating more observed samples and thus closer to satisfying activation rule. When context $x_{t}$ arrives, balls C and D are relevant. If C has a higher index value than D, a bid (red point) is sampled from D on the dashed line. Since D meets the activation condition, a new ball F is activated. The blue curve shows the context arrival distribution, guiding finer discretization in dense regions.
  • Figure 5:
  • ...and 5 more figures

Theorems & Definitions (1)

  • Theorem 1: Regret bound