Peer-to-Peer Energy Trading in Dairy Farms using Multi-Agent Reinforcement Learning
Mian Ibad Ali Shah, Marcos Eduardo Cruz Victorio, Maeve Duffy, Enda Barrett, Karl Mason
TL;DR
The paper tackles high tariff-induced energy costs in rural dairy farms by deploying multi-agent reinforcement learning (MARL) to enable distributed P2P energy trading. It combines PPO and DQN with a price-advisor and a double-auction market to learn dairy-farm bidding strategies under realistic load, generation, and battery constraints. Key contributions include a fully MARL-based MAPDES framework, SDR-based price formation, and an ablated analysis showing the price advisor and dairy-specific constraints as critical to performance, with DQN delivering the strongest cost reductions and revenue gains while reducing peak demand. The results demonstrate significant economic and grid-support benefits of P2P trading in dairy communities, with robust cross-country generalization (Ireland to Finland) and clear guidance for scalable, privacy-preserving market design in rural energy systems.
Abstract
The integration of renewable energy resources in rural areas, such as dairy farming communities, enables decentralized energy management through Peer-to-Peer (P2P) energy trading. This research highlights the role of P2P trading in efficient energy distribution and its synergy with advanced optimization techniques. While traditional rule-based methods perform well under stable conditions, they struggle in dynamic environments. To address this, Multi-Agent Reinforcement Learning (MARL), specifically Proximal Policy Optimization (PPO) and Deep Q-Networks (DQN), is combined with community/distributed P2P trading mechanisms. By incorporating auction-based market clearing, a price advisor agent, and load and battery management, the approach achieves significant improvements. Results show that, compared to baseline models, DQN reduces electricity costs by 14.2% in Ireland and 5.16% in Finland, while increasing electricity revenue by 7.24% and 12.73%, respectively. PPO achieves the lowest peak hour demand, reducing it by 55.5% in Ireland, while DQN reduces peak hour demand by 50.0% in Ireland and 27.02% in Finland. These improvements are attributed to both MARL algorithms and P2P energy trading, which together results in electricity cost and peak hour demand reduction, and increase electricity selling revenue. This study highlights the complementary strengths of DQN, PPO, and P2P trading in achieving efficient, adaptable, and sustainable energy management in rural communities.
