Personalized Dynamic Pricing Policy for Electric Vehicles: Reinforcement learning approach

Sangjun Bae; Balazs Kulcsar; Sebastien Gros

Personalized Dynamic Pricing Policy for Electric Vehicles: Reinforcement learning approach

Sangjun Bae, Balazs Kulcsar, Sebastien Gros

TL;DR

This study proposes a personalized dynamic pricing policy (PeDP) for a fast-EVCS to maximize revenue using a reinforcement learning (RL) approach and evaluates the performance of the proposed PeDP and analyzes the effects of the information on the policy.

Abstract

With the increasing number of fast-electric vehicle charging stations (fast-EVCSs) and the popularization of information technology, electricity price competition between fast-EVCSs is highly expected, in which the utilization of public and/or privacy-preserved information will play a crucial role. Self-interest electric vehicle (EV) users, on the other hand, try to select a fast-EVCS for charging in a way to maximize their utilities based on electricity price, estimated waiting time, and their state of charge. While existing studies have largely focused on finding equilibrium prices, this study proposes a personalized dynamic pricing policy (PeDP) for a fast-EVCS to maximize revenue using a reinforcement learning (RL) approach. We first propose a multiple fast-EVCSs competing simulation environment to model the selfish behavior of EV users using a game-based charging station selection model with a monetary utility function. In the environment, we propose a Q-learning-based PeDP to maximize fast-EVCS' revenue. Through numerical simulations based on the environment: (1) we identify the importance of waiting time in the EV charging market by comparing the classic Bertrand competition model with the proposed PeDP for fast-EVCSs (from the system perspective); (2) we evaluate the performance of the proposed PeDP and analyze the effects of the information on the policy (from the service provider perspective); and (3) it can be seen that privacy-preserved information sharing can be misused by artificial intelligence-based PeDP in a certain situation in the EV charging market (from the customer perspective).

Personalized Dynamic Pricing Policy for Electric Vehicles: Reinforcement learning approach

TL;DR

Abstract

Paper Structure (17 sections, 3 theorems, 9 equations, 13 figures, 4 tables, 3 algorithms)

This paper contains 17 sections, 3 theorems, 9 equations, 13 figures, 4 tables, 3 algorithms.

Introduction
Decision-making for pricing problem
Simulation environment
System Model
Charging Station Selection Game (CSSG)
A monetary utility function
Algorithm for CSSG
Personalized Dynamic Pricing Policy Using RL
Personalized Dynamic Pricing Problem
Reinforcement Learning Approach
Convergence
Numerical simulations
Case 1: from the system perspective
Case 2: from the fast-EVCS perspective
Case 3: from the EV user perspective
...and 2 more sections

Key Result

Lemma 1

If an instance $(\mathcal{B},\mathcal{E},\mathcal{P})$ in which the current partition is Nash stable holds SPAO, the new instance $(\hat{\mathcal{B}},\mathcal{E},\hat{\mathcal{P}})$ that has a new EV $b_q \notin \mathcal{B}$ holding SPAO with regard to every EVCS $e_j \in \mathcal{E}$ also, (1) sati

Figures (13)

Figure 1: Conceptual image for our system model (highway scenario). Each fast-EVCS adjusts its charging price based on public (green box) and/or privacy-preserved (red box) information. Each selfish EV user selects a fast-EVCS for charging based on the public and/or personalized (yellow box) information.
Figure 2: Battery SOC obtained versus charging time spent on a Li-ion battery. Typically, the last phase of the charging curve indicates that the available battery SOC has a nonlinear dependence on the charging time spent Wang2016
Figure 3: The reinforcement learning framework of PeDP: a fast-EVCS offers personalized electricity prices (action) in the environment, which are interpreted into electricity bills (reward) and the public/privacy-information (state), which are fed back into the fast-EVCS.
Figure 4: Flowchart for implementing the Q-learning algorithm to the personalized dynamic pricing problem (training stage)
Figure 5: System model for case 1 and case 2. Two fast-EVCSs are located at the same place [$e_1 = 250$km, $e_2 = 250$km]
...and 8 more figures

Theorems & Definitions (6)

Definition 1: Deviated partition
Definition 2: Nash stable partition
Definition 3: SPAO condition
Lemma 1
Theorem 1: Existence of Algorithm \ref{['algorithm:GRAPE']}
Theorem 2: Convergence of Algorithm \ref{['algorithm:GRAPE']}

Personalized Dynamic Pricing Policy for Electric Vehicles: Reinforcement learning approach

TL;DR

Abstract

Personalized Dynamic Pricing Policy for Electric Vehicles: Reinforcement learning approach

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (6)