Learning Prosumer Behavior in Energy Communities: Integrating Bilevel Programming and Online Learning
Bennevis Crowley, Jalal Kazempour, Lesia Mitridati, Mahnoosh Alizadeh
TL;DR
The paper addresses the challenge of dynamic pricing for demand response when prosumer behavior is uncertain by introducing BiPS, a framework that integrates bilevel price setting with Thompson sampling to learn individual prosumer signatures online. It models each prosumer as a mixture of asset-level signatures with weights learned over time, reformulates the bilevel problem into a MILP via strong duality and KKT conditions, and solves it daily to set prices that respect capacity limits while minimizing community costs. Empirical results with 25 prosumers and 10 signatures show rapid learning, with most weights learned within 5 days and full convergence by 100 days, alongside near-zero regret and adherence to capacity constraints; non-stationarity analyses demonstrate the need to reset priors when underlying weights change. The approach provides a practical, data-efficient method for enabling grid services in energy communities through price-based demand response while preserving prosumer privacy and reducing reliance on extensive pre-existing datasets.
Abstract
Dynamic pricing through bilevel programming is widely used for demand response but often assumes perfect knowledge of prosumer behavior, which is unrealistic in practical applications. This paper presents a novel framework that integrates bilevel programming with online learning, specifically Thompson sampling, to overcome this limitation. The approach dynamically sets optimal prices while simultaneously learning prosumer behaviors through observed responses, eliminating the need for extensive pre-existing datasets. Applied to an energy community providing capacity limitation services to a distribution system operator, the framework allows the community manager to infer individual prosumer characteristics, including usage patterns for photovoltaic systems, electric vehicles, home batteries, and heat pumps. Numerical simulations with 25 prosumers, each represented by 10 potential signatures, demonstrate rapid learning with low regret, with most prosumer characteristics learned within five days and full convergence achieved in 100 days.
