Robust Iterative Value Conversion: Deep Reinforcement Learning for Neurochip-driven Edge Robots
Yuki Kadokawa, Tomohito Kodera, Yoshihisa Tsurumine, Shinya Nishimura, Takamitsu Matsubara
TL;DR
The paper addresses the challenge of training DRL policies for edge robots implemented as SNNs on neurochips, where iterative policy conversion from FPNNs to SNNs causes disruptive errors. It introduces Robust Iterative Value Conversion (RIVC), which combines quantization-aware learning (matching SNN bit-width $k$, typically 4 or 8) with a gap-increasing operator that uses parameters $oldsymbol{eta}$ and $oldsymbol{ u}$ to widen the action-gap and resist conversion drift. Key contributions include a novel DRL framework for neurochip-based SNN policies, a conversion-robust policy-update mechanism, and empirical validation showing substantial energy and speed benefits on real-robot tasks (SNN policies on neurochips ~15× more power-efficient and ~5× faster than edge CPUs), while prior methods fail to train under conversion errors. These findings highlight the practical potential of on-chip learning for energy-constrained robotic applications using frame-based vision and neurochips like Akida.
Abstract
A neurochip is a device that reproduces the signal processing mechanisms of brain neurons and calculates Spiking Neural Networks (SNNs) with low power consumption and at high speed. Thus, neurochips are attracting attention from edge robot applications, which suffer from limited battery capacity. This paper aims to achieve deep reinforcement learning (DRL) that acquires SNN policies suitable for neurochip implementation. Since DRL requires a complex function approximation, we focus on conversion techniques from Floating Point NN (FPNN) because it is one of the most feasible SNN techniques. However, DRL requires conversions to SNNs for every policy update to collect the learning samples for a DRL-learning cycle, which updates the FPNN policy and collects the SNN policy samples. Accumulative conversion errors can significantly degrade the performance of the SNN policies. We propose Robust Iterative Value Conversion (RIVC) as a DRL that incorporates conversion error reduction and robustness to conversion errors. To reduce them, FPNN is optimized with the same number of quantization bits as an SNN. The FPNN output is not significantly changed by quantization. To robustify the conversion error, an FPNN policy that is applied with quantization is updated to increase the gap between the probability of selecting the optimal action and other actions. This step prevents unexpected replacements of the policy's optimal actions. We verified RIVC's effectiveness on a neurochip-driven robot. The results showed that RIVC consumed 1/15 times less power and increased the calculation speed by five times more than an edge CPU (quad-core ARM Cortex-A72). The previous framework with no countermeasures against conversion errors failed to train the policies. Videos from our experiments are available: https://youtu.be/Q5Z0-BvK1Tc.
