Table of Contents
Fetching ...

Neural Circuit Architectural Priors for Quadruped Locomotion

Nikhil X. Bhattasali, Venkatesh Pattabiraman, Lerrel Pinto, Grace W. Lindsay

TL;DR

This work explores the advantages of a biologically inspired ANN architecture for quadruped locomotion based on neural circuits in the limbs and spinal cord of mammals and achieves good initial performance and comparable final performance to MLPs, while using less data and orders of magnitude fewer parameters.

Abstract

Learning-based approaches to quadruped locomotion commonly adopt generic policy architectures like fully connected MLPs. As such architectures contain few inductive biases, it is common in practice to incorporate priors in the form of rewards, training curricula, imitation data, or trajectory generators. In nature, animals are born with priors in the form of their nervous system's architecture, which has been shaped by evolution to confer innate ability and efficient learning. For instance, a horse can walk within hours of birth and can quickly improve with practice. Such architectural priors can also be useful in ANN architectures for AI. In this work, we explore the advantages of a biologically inspired ANN architecture for quadruped locomotion based on neural circuits in the limbs and spinal cord of mammals. Our architecture achieves good initial performance and comparable final performance to MLPs, while using less data and orders of magnitude fewer parameters. Our architecture also exhibits better generalization to task variations, even admitting deployment on a physical robot without standard sim-to-real methods. This work shows that neural circuits can provide valuable architectural priors for locomotion and encourages future work in other sensorimotor skills.

Neural Circuit Architectural Priors for Quadruped Locomotion

TL;DR

This work explores the advantages of a biologically inspired ANN architecture for quadruped locomotion based on neural circuits in the limbs and spinal cord of mammals and achieves good initial performance and comparable final performance to MLPs, while using less data and orders of magnitude fewer parameters.

Abstract

Learning-based approaches to quadruped locomotion commonly adopt generic policy architectures like fully connected MLPs. As such architectures contain few inductive biases, it is common in practice to incorporate priors in the form of rewards, training curricula, imitation data, or trajectory generators. In nature, animals are born with priors in the form of their nervous system's architecture, which has been shaped by evolution to confer innate ability and efficient learning. For instance, a horse can walk within hours of birth and can quickly improve with practice. Such architectural priors can also be useful in ANN architectures for AI. In this work, we explore the advantages of a biologically inspired ANN architecture for quadruped locomotion based on neural circuits in the limbs and spinal cord of mammals. Our architecture achieves good initial performance and comparable final performance to MLPs, while using less data and orders of magnitude fewer parameters. Our architecture also exhibits better generalization to task variations, even admitting deployment on a physical robot without standard sim-to-real methods. This work shows that neural circuits can provide valuable architectural priors for locomotion and encourages future work in other sensorimotor skills.

Paper Structure

This paper contains 56 sections, 10 equations, 17 figures, 2 tables.

Figures (17)

  • Figure 1: Biological Locomotion.A, Quadruped mammals share a homologous organization of their musculoskeletal systems and neural circuits. Surprisingly, neural circuits in the limbs and spinal cord are sufficient to produce locomotion, while higher brain regions are important to initiate and regulate locomotion. B, Neural circuits for rhythm generation (RG) and brainstem command (BC), adapted from Danner2017ComputationalModelingSpinal. Each limb is controlled by flexor (F) and extensor (E) half-centers. Between limbs, half-centers communicate through connections that promote synchonization or alternation. Brainstem command signals modulate the half-centers and connection activations. C, Neural circuits for pattern formation (PF) and afferent feedback (AF), adapted from Kim2022ContributionAfferentFeedback. Within limbs, interneurons and motorneurons convert half-center states into specific muscle commands, while sensory feedback modulates the circuit at multiple levels.
  • Figure 2: Architecture Units. The architecture uses 2 neuron types, both of which output rate-coded activity and can receive excitatory or inhibitory synaptic input. For each type, we show the circuit schematic symbol (left), key hyperparameters (middle), and an example waveform response to input (right). A, The Basic unit is a typical neuron that activates proportionally in response to input once its internal voltage exceeds a threshold. This unit is used for most neurons in the architecture. B, The Oscillator unit is a special neuron that exhibits intrinsically bursting activity in the absence of inputs. It scales its intrinsic period in response to constant input, and it entrains its phase in response to periodic input. This unit is used for the flexor half-centers in the RG module.
  • Figure 3: Architecture Structure.A, The Quadruped robot has 4 limbs, each with 3 joints (hip, thigh, calf) and 1 foot pad. The ANN agent receives proprioceptive/pressure observations, and it produces joint position target actions that are converted to actuator torque commands by a low-level PD controller. B, The Quadruped NCAP architecture mirrors mammalian locomotion circuits. The RG module receives brainstem commands $c_t$ that set the speed and gait. Each limb has an AF module that uses leg observations $o_t$ to modulate RG oscillators, as well as a PF module that converts RG half-center states into leg actions $a_t$.
  • Figure 4: Performance and Data Efficiency.A, Performance curves across tasks. Solid lines are mean normalized returns across 10 training seeds, each tested for 5 episodes per epoch. Shaded areas are 95% bootstrapped confidence intervals. Maximum normalized episodic return is 1, and a policy that outputs all zeros achieves 0.5 by standing still. NCAP matches or exceeds the asymptotic performance of MLP, with superior data efficiency. NCAP also demonstrates better initial performance since it is an effective prior. B, Footfall plots across 3 training seeds on the Flat/Walk task. Colored segments encode the foot contact pressures during stance, while blank segments indicate the limb is in swing. NCAP exhibits qualitatively more naturalistic and consistent gaits than MLP, despite their quantitatively similar asymptotic performances.
  • Figure 5: Parameter Efficiency.A, Performance curves across MLP sizes on the Bumpy/Run task. Smaller MLPs achieve lower asymptotic performance and worse data efficiency. Therefore, having fewer parameters is insufficient to account for NCAP's advantages. B, Parameter count across architectures (log scale). NCAP(Test) has fewer parameters than MLP(4,4), and even NCAP(Train) with the overparameterization trick (\ref{['sec:a1_overparameterization_trick']}) has orders of magnitude fewer parameters than MLPs at typical sizes.
  • ...and 12 more figures