Physical Reinforcement Learning
Sam Dillavou, Shruti Mishra
TL;DR
The paper tackles the challenge of energy-efficient, fault-tolerant reinforcement learning by deploying Contrastive Local Learning Networks (CLLNs), analog networks of self-adjusting resistors. It adapts Q-learning to operate on a simulated CLLN, encoding environmental states as input voltages and interpreting outputs as action-values, with updates governed by the local contrastive rule $\delta G_i = \alpha [ (\Delta V_i^F)^2 - (\Delta V_i^C)^2 ]$ and driven by a contrastive power difference $\mathcal{P}^C-\mathcal{P}^F$. Two tasks—a four-state, four-action MDP and a nine-state grid navigation task—show near-optimal performance in most trials, validating the approach. The work discusses which RL components map naturally to CLLNs (e.g., policy/value expectations) versus which require additional hardware (e.g., replay buffers or memory), and outlines how physical constraints shape learning in analog substrates. It also highlights implications for energy efficiency, robustness to damage, and opportunities to tailor learning toward hardware-friendly objectives.
Abstract
Digital computers are power-hungry and largely intolerant of damaged components, making them potentially difficult tools for energy-limited autonomous agents in uncertain environments. Recently developed Contrastive Local Learning Networks (CLLNs) - analog networks of self-adjusting nonlinear resistors - are inherently low-power and robust to physical damage, but were constructed to perform supervised learning. In this work we demonstrate success on two simple RL problems using Q-learning adapted for simulated CLLNs. Doing so makes explicit the components (beyond the network being trained) required to enact various tools in the RL toolbox, some of which (policy function and value function) are more natural in this system than others (replay buffer). We discuss assumptions such as the physical safety that digital hardware requires, CLLNs can forgo, and biological systems cannot rely on, and highlight secondary goals that are important in biology and trainable in CLLNs, but make little sense in digital computers.
