Table of Contents
Fetching ...

Personalized Dynamic Pricing Policy for Electric Vehicles: Reinforcement learning approach

Sangjun Bae, Balazs Kulcsar, Sebastien Gros

TL;DR

This study proposes a personalized dynamic pricing policy (PeDP) for a fast-EVCS to maximize revenue using a reinforcement learning (RL) approach and evaluates the performance of the proposed PeDP and analyzes the effects of the information on the policy.

Abstract

With the increasing number of fast-electric vehicle charging stations (fast-EVCSs) and the popularization of information technology, electricity price competition between fast-EVCSs is highly expected, in which the utilization of public and/or privacy-preserved information will play a crucial role. Self-interest electric vehicle (EV) users, on the other hand, try to select a fast-EVCS for charging in a way to maximize their utilities based on electricity price, estimated waiting time, and their state of charge. While existing studies have largely focused on finding equilibrium prices, this study proposes a personalized dynamic pricing policy (PeDP) for a fast-EVCS to maximize revenue using a reinforcement learning (RL) approach. We first propose a multiple fast-EVCSs competing simulation environment to model the selfish behavior of EV users using a game-based charging station selection model with a monetary utility function. In the environment, we propose a Q-learning-based PeDP to maximize fast-EVCS' revenue. Through numerical simulations based on the environment: (1) we identify the importance of waiting time in the EV charging market by comparing the classic Bertrand competition model with the proposed PeDP for fast-EVCSs (from the system perspective); (2) we evaluate the performance of the proposed PeDP and analyze the effects of the information on the policy (from the service provider perspective); and (3) it can be seen that privacy-preserved information sharing can be misused by artificial intelligence-based PeDP in a certain situation in the EV charging market (from the customer perspective).

Personalized Dynamic Pricing Policy for Electric Vehicles: Reinforcement learning approach

TL;DR

This study proposes a personalized dynamic pricing policy (PeDP) for a fast-EVCS to maximize revenue using a reinforcement learning (RL) approach and evaluates the performance of the proposed PeDP and analyzes the effects of the information on the policy.

Abstract

With the increasing number of fast-electric vehicle charging stations (fast-EVCSs) and the popularization of information technology, electricity price competition between fast-EVCSs is highly expected, in which the utilization of public and/or privacy-preserved information will play a crucial role. Self-interest electric vehicle (EV) users, on the other hand, try to select a fast-EVCS for charging in a way to maximize their utilities based on electricity price, estimated waiting time, and their state of charge. While existing studies have largely focused on finding equilibrium prices, this study proposes a personalized dynamic pricing policy (PeDP) for a fast-EVCS to maximize revenue using a reinforcement learning (RL) approach. We first propose a multiple fast-EVCSs competing simulation environment to model the selfish behavior of EV users using a game-based charging station selection model with a monetary utility function. In the environment, we propose a Q-learning-based PeDP to maximize fast-EVCS' revenue. Through numerical simulations based on the environment: (1) we identify the importance of waiting time in the EV charging market by comparing the classic Bertrand competition model with the proposed PeDP for fast-EVCSs (from the system perspective); (2) we evaluate the performance of the proposed PeDP and analyze the effects of the information on the policy (from the service provider perspective); and (3) it can be seen that privacy-preserved information sharing can be misused by artificial intelligence-based PeDP in a certain situation in the EV charging market (from the customer perspective).
Paper Structure (17 sections, 3 theorems, 9 equations, 13 figures, 4 tables, 3 algorithms)

This paper contains 17 sections, 3 theorems, 9 equations, 13 figures, 4 tables, 3 algorithms.

Key Result

Lemma 1

If an instance $(\mathcal{B},\mathcal{E},\mathcal{P})$ in which the current partition is Nash stable holds SPAO, the new instance $(\hat{\mathcal{B}},\mathcal{E},\hat{\mathcal{P}})$ that has a new EV $b_q \notin \mathcal{B}$ holding SPAO with regard to every EVCS $e_j \in \mathcal{E}$ also, (1) sati

Figures (13)

  • Figure 1: Conceptual image for our system model (highway scenario). Each fast-EVCS adjusts its charging price based on public (green box) and/or privacy-preserved (red box) information. Each selfish EV user selects a fast-EVCS for charging based on the public and/or personalized (yellow box) information.
  • Figure 2: Battery SOC obtained versus charging time spent on a Li-ion battery. Typically, the last phase of the charging curve indicates that the available battery SOC has a nonlinear dependence on the charging time spent Wang2016
  • Figure 3: The reinforcement learning framework of PeDP: a fast-EVCS offers personalized electricity prices (action) in the environment, which are interpreted into electricity bills (reward) and the public/privacy-information (state), which are fed back into the fast-EVCS.
  • Figure 4: Flowchart for implementing the Q-learning algorithm to the personalized dynamic pricing problem (training stage)
  • Figure 5: System model for case 1 and case 2. Two fast-EVCSs are located at the same place [$e_1 = 250$km, $e_2 = 250$km]
  • ...and 8 more figures

Theorems & Definitions (6)

  • Definition 1: Deviated partition
  • Definition 2: Nash stable partition
  • Definition 3: SPAO condition
  • Lemma 1
  • Theorem 1: Existence of Algorithm \ref{['algorithm:GRAPE']}
  • Theorem 2: Convergence of Algorithm \ref{['algorithm:GRAPE']}