Table of Contents
Fetching ...

Learning to Control DC Motor for Micromobility in Real Time with Reinforcement Learning

Bibek Poudel, Thomas Watson, Weizi Li

TL;DR

This work learns to steer a DC motor via sample-efficient reinforcement learning and builds a simulator to experiment with a wide range of parameters and learning strategies.

Abstract

Autonomous micromobility has been attracting the attention of researchers and practitioners in recent years. A key component of many micro-transport vehicles is the DC motor, a complex dynamical system that is continuous and non-linear. Learning to quickly control the DC motor in the presence of disturbances and uncertainties is desired for various applications that require robustness and stability. Techniques to accomplish this task usually rely on a mathematical system model, which is often insufficient to anticipate the effects of time-varying and interrelated sources of non-linearities. While some model-free approaches have been successful at the task, they rely on massive interactions with the system and are trained in specialized hardware in order to fit a highly parameterized controller. In this work, we learn to steer a DC motor via sample-efficient reinforcement learning. Using data collected from hardware interactions in the real world, we additionally build a simulator to experiment with a wide range of parameters and learning strategies. With the best parameters found, we learn an effective control policy in one minute and 53 seconds on a simulation and in 10 minutes and 35 seconds on a physical system.

Learning to Control DC Motor for Micromobility in Real Time with Reinforcement Learning

TL;DR

This work learns to steer a DC motor via sample-efficient reinforcement learning and builds a simulator to experiment with a wide range of parameters and learning strategies.

Abstract

Autonomous micromobility has been attracting the attention of researchers and practitioners in recent years. A key component of many micro-transport vehicles is the DC motor, a complex dynamical system that is continuous and non-linear. Learning to quickly control the DC motor in the presence of disturbances and uncertainties is desired for various applications that require robustness and stability. Techniques to accomplish this task usually rely on a mathematical system model, which is often insufficient to anticipate the effects of time-varying and interrelated sources of non-linearities. While some model-free approaches have been successful at the task, they rely on massive interactions with the system and are trained in specialized hardware in order to fit a highly parameterized controller. In this work, we learn to steer a DC motor via sample-efficient reinforcement learning. Using data collected from hardware interactions in the real world, we additionally build a simulator to experiment with a wide range of parameters and learning strategies. With the best parameters found, we learn an effective control policy in one minute and 53 seconds on a simulation and in 10 minutes and 35 seconds on a physical system.

Paper Structure

This paper contains 11 sections, 5 equations, 3 figures, 1 algorithm.

Figures (3)

  • Figure 1: Schematic diagram of our control task. The steering commands are input to a PC, which are then converted to voltage signals (interfaced through an Arduino) and applied to the motor. The feedback signals to the PC are obtained from the rotary encoder.
  • Figure 2: LEFT: Position trajectories with the same $100$ arbitrarily initializations in hardware and simulation where $\pm0.05$ is the success range. $21$ trajectories in the hardware initialized between $0.25$ and $0.5$ fail due to high velocity, while trajectories initialized within $\pm0.25$ have a higher chance of success. Due to approximation errors in simulation, $21$ trajectories "stall" in their initialized states, i.e., neither fail nor succeed until the end of $250$ timesteps. RIGHT: The results of our control task in simulation and on hardware (partially using best parameters and settings determined from the simulation, e.g., size of hint-to-goal $=2\%$, network reset frequency$=50$). In the simulation, the results are averaged over five randomized neural network weights initializations; the training loss per episode on average decreases after the first reset at $50$ episodes to reach the vicinity of $0.2$ at episode $300$, indicating a high probability of successful trajectory completion, while on hardware the loss per episode converges to $0.4$. In both cases, the training loss per episode is shown as a moving average with a window size of $30$. On average, the training time in simulation for $300$ episodes is one minute and $53$ seconds in a Macbook Air with an M1 processor and $8$GB RAM. In contrast, training on hardware with real-world interactions for $150$ episodes is complete in $10$ minutes and $35$ seconds in a Macbook Pro with an intel core-i7 processor and $16$GB RAM.
  • Figure 3: Experiments on various parameterizations and learning strategies in simulation. Each result is averaged over five different random neural network weight initializations during training and $10$ episodes of testing. In each initialization, the NFQ algorithm is trained for $290$ episodes and tested for $10$ episodes. All experiments (training and test) are performed on a Macbook Air with an M1 processor and $8$GB RAM. a) Average loss per episode decreases (increasing the probability of success) as the number of parameters increases from $39$ to $91$; further increasing of the number of parameters incurs higher losses. b) The size of artificially induced transitions in training set at $2\%$ has the lowest loss. c) Among various exploration strategies, linearly decaying the probability of exploration incurs the least loss. d) Resetting the weights of the neural network every $50$ episodes converges to a lower training loss. e) sampling the steering wheel position initializations closer to the goal position $(0)$ from a Gaussian distribution $\mathcal{N}(0, 0.02)$ leads to the lowest loss.