Table of Contents
Fetching ...

Spiking Neural Networks for Continuous Control via End-to-End Model-Based Learning

Justus Huebotter, Pablo Lanillos, Marcel van Gerven, Serge Thill

TL;DR

This work demonstrates that fully spiking neural networks can be trained end-to-end for high-dimensional continuous control by integrating a predictive forward model with a goal-directed policy in a predictive-control framework. Using surrogate gradients and a carefully engineered architecture (including learnable time constants, adaptive thresholds, and latent-space compression), the Pred-Control SNN achieves control performance comparable to large non-spiking baselines while using markedly fewer parameters. Key findings show that prediction accuracy is not the sole determinant of control quality; an expressive yet trainable forward model that yields stable gradients suffices for effective planning and execution. The results establish SNNs as viable, scalable substrates for energy-efficient, high-DOF robotic control and provide design principles for robust end-to-end spiking controllers applicable to neuromorphic hardware.

Abstract

Despite recent progress in training spiking neural networks (SNNs) for classification, their application to continuous motor control remains limited. Here, we demonstrate that fully spiking architectures can be trained end-to-end to control robotic arms with multiple degrees of freedom in continuous environments. Our predictive-control framework combines Leaky Integrate-and-Fire dynamics with surrogate gradients, jointly optimizing a forward model for dynamics prediction and a policy network for goal-directed action. We evaluate this approach on both a planar 2D reaching task and a simulated 6-DOF Franka Emika Panda robot with torque control. In direct comparison to non-spiking recurrent baselines trained under the same predictive-control pipeline, the proposed SNN achieves comparable task performance while using substantially fewer parameters. An extensive ablation study highlights the role of initialization, learnable time constants, adaptive thresholds, and latent-space compression as key contributors to stable training and effective control. Together, these findings establish spiking neural networks as a viable and scalable substrate for high-dimensional continuous control, while emphasizing the importance of principled architectural and training design.

Spiking Neural Networks for Continuous Control via End-to-End Model-Based Learning

TL;DR

This work demonstrates that fully spiking neural networks can be trained end-to-end for high-dimensional continuous control by integrating a predictive forward model with a goal-directed policy in a predictive-control framework. Using surrogate gradients and a carefully engineered architecture (including learnable time constants, adaptive thresholds, and latent-space compression), the Pred-Control SNN achieves control performance comparable to large non-spiking baselines while using markedly fewer parameters. Key findings show that prediction accuracy is not the sole determinant of control quality; an expressive yet trainable forward model that yields stable gradients suffices for effective planning and execution. The results establish SNNs as viable, scalable substrates for energy-efficient, high-DOF robotic control and provide design principles for robust end-to-end spiking controllers applicable to neuromorphic hardware.

Abstract

Despite recent progress in training spiking neural networks (SNNs) for classification, their application to continuous motor control remains limited. Here, we demonstrate that fully spiking architectures can be trained end-to-end to control robotic arms with multiple degrees of freedom in continuous environments. Our predictive-control framework combines Leaky Integrate-and-Fire dynamics with surrogate gradients, jointly optimizing a forward model for dynamics prediction and a policy network for goal-directed action. We evaluate this approach on both a planar 2D reaching task and a simulated 6-DOF Franka Emika Panda robot with torque control. In direct comparison to non-spiking recurrent baselines trained under the same predictive-control pipeline, the proposed SNN achieves comparable task performance while using substantially fewer parameters. An extensive ablation study highlights the role of initialization, learnable time constants, adaptive thresholds, and latent-space compression as key contributors to stable training and effective control. Together, these findings establish spiking neural networks as a viable and scalable substrate for high-dimensional continuous control, while emphasizing the importance of principled architectural and training design.

Paper Structure

This paper contains 54 sections, 27 equations, 29 figures, 5 tables, 5 algorithms.

Figures (29)

  • Figure 1: Pred-Control SNN architecture. The system consists of two spiking neural networks composed of LIF neurons with learnable parameters $\bm{\theta}$: a prediction network$\bm{\upsilon}$ (forward model), which receives the current robot state $\bm{s}_t$ and control signal $\bm{u}_t$ to predict the state change $\Delta \hat{\bm{s}}_t$, and a policy network$\bm{\pi}$ (inverse model), which takes in the current state $\bm{s}_t$ and target state $\bm{s}^*_t$ to compute a control output $\bm{u}_t$. During active control, only the policy network is used; during training, the prediction network is rolled out autoregressively to provide differentiable state estimates for optimizing the policy. Each network ends in a continuous readout layer decoding membrane voltages into output vectors. A schematic of the LIF neuron model is shown on the right.
  • Figure 2: Dynamics of the LIF neuron model and the influence of temporal parameters.Left: The membrane voltage response $U(t)$ to a single input spike varies in amplitude and duration depending on the membrane and synaptic time constants $\tau_{\text{mem}}$ and $\tau_{\text{syn}}$. Center: Full LIF temporal dynamics under three regimes of injected current $I_{\text{inj}}(t)$: constant input, silence, and high-frequency noise. Traces show filtered current $I(t)$, membrane potential $U(t)$, and spike activity $S(t)$. Right: Firing rate response curves over constant current injection amplitudes. The orange and light blue traces, corresponding to larger and smaller time constants respectively, show overall similar response curves but with distinct differences. While the light blue (fast) neuron requires higher input to spike, it responds more rapidly once active; the slower orange neuron integrates more gradually but yields longer activity traces. This tradeoff between response speed, firing magnitude, and trace duration critically affects not only neuronal responsiveness but also the effective temporal horizon over which surrogate gradients can propagate. Balancing these dynamics is a central challenge in the design and training of spiking neural networks.
  • Figure 3: Main results on the 3D reaching task. Learning curves comparing a non-spiking recurrent baseline (small and large variants), a basic spiking controller, and the full Pred-Control SNN (large). Top row: task-level performance metrics (cumulative distance to target, success rate, time on target, and Prediction MSE). The Pred-Control SNN achieves task performance on par with the large non-spiking baseline despite using more than an order of magnitude fewer parameters, while clearly outperforming the basic SNN and the small non-spiking baseline. The non-spiking baselines converge slightly faster and achieve lower long-horizon prediction error, but this advantage does not translate into superior control performance. The Pred-Control SNN learns more gradually but exhibits smoother and more stable optimization dynamics. Curves show mean $\pm$ s.e.m. over 10 random seeds.
  • Figure 4: Pred-Control SNN behavior during task execution. Shown are 3D trajectories and Euclidean error traces (I) over the course of training, along with spiking activity (II) and voltage traces (III) from the prediction and policy networks. Top-row plots are shown over 200 environment time steps per episode. The middle and bottom rows are plotted over internal spiking model time, where each environment step corresponds to 7 spiking substeps, resulting in 1400 model time steps per episode. This distinction reflects the multi-timescale structure of the controller and is consistent across all experiments. The model exhibits progressively smoother control and more consistent activity as training proceeds. Spiking activity in the policy network becomes denser and more regular, while the prediction network shows and opposing trend. Mild oscillatory behavior near the target is visible in later training stages.
  • Figure 5: Effect of learning rates $\alpha_{\bm{\pi}}$ and $\alpha_{\bm{\upsilon}}$ on training in the 2D control task. We vary $\alpha_{\bm{\upsilon}}$ across columns and $\alpha_{\bm{\pi}}$ across line colors. Performance is most easily distinguished by the time spent on target (row 3), which clearly peaks when both networks use $\alpha = 10^{-3}$. Learning stability is preserved across all settings, with no signs of vanishing or exploding gradients, yet performance still varies markedly with learning rate choice. The policy is sensitive to prediction quality, but the inverse does not hold: policy failure does not impair model learning. Overall, the best choice of parameter based on the performance metrics was found at $\alpha_{\bm{\pi}} = \alpha_{\bm{\upsilon}} = 10^{-3}$.
  • ...and 24 more figures