Neurosim: A Fast Simulator for Neuromorphic Robot Perception

Richeek Das; Pratik Chaudhari

Neurosim: A Fast Simulator for Neuromorphic Robot Perception

Richeek Das, Pratik Chaudhari

TL;DR

The design philosophy behind Neurosim and Cortex are discussed and it is demonstrated how they can be used to train neuromorphic perception and control algorithms, e.g., using self-supervised learning on time-synchronized multi-modal data, and test real-time implementations of these algorithms in closed-loop.

Abstract

Neurosim is a fast, real-time, high-performance library for simulating sensors such as dynamic vision sensors, RGB cameras, depth sensors, and inertial sensors. It can also simulate agile dynamics of multi-rotor vehicles in complex and dynamic environments. Neurosim can achieve frame rates as high as ~2700 FPS on a desktop GPU. Neurosim integrates with a ZeroMQ-based communication library called Cortex to facilitate seamless integration with machine learning and robotics workflows. Cortex provides a high-throughput, low-latency message-passing system for Python and C++ applications, with native support for NumPy arrays and PyTorch tensors. This paper discusses the design philosophy behind Neurosim and Cortex. It demonstrates how they can be used to (i) train neuromorphic perception and control algorithms, e.g., using self-supervised learning on time-synchronized multi-modal data, and (ii) test real-time implementations of these algorithms in closed-loop. Neurosim and Cortex are available at https://github.com/grasp-lyrl/neurosim .

Neurosim: A Fast Simulator for Neuromorphic Robot Perception

TL;DR

Abstract

Paper Structure (10 sections, 2 equations, 6 figures)

This paper contains 10 sections, 2 equations, 6 figures.

Introduction
The Design of Neurosim
Neurosim simulates event cameras at multi-kilohertz rates.
Running closed-loop perception and control experiments at the extremes of the performance envelope of the hardware
The Cortex Communication Interface
Feeding high-throughput multi-modal robot sensory data to deep learning training pipelines.
Applications
Real-time, closed-loop control.
Online training of self-supervised event representations on simulated data.
Discussion

Figures (6)

Figure 1: Event cameras capture the scene without temporal aliasing, with high dynamic range, and consume very little power. We illustrate the temporal non-aliasing property of event cameras in the left image. Unlike standard RGB cameras, event cameras are asynchronous sensors that respond to scene intensity changes and produce a continuous stream of information at high temporal and spatial precision.
Figure 2: Overview of the design of Neurosim. Neurosim is designed to be modular, high-performance, and easy to use for a variety of applications in embodied perception on multirotors. As shown in (A), it consists of four main components: (1) A data stack in the form of 3D scene assets and the specifications for sensors and multirotors, (2) A real-time rendering engine for high-fidelity vision sensor simulation, (3) A fast and accurate multirotor dynamics model for aerodynamics and physics, and (4) A communication interface to interact with the simulator -- receive sensor data and send control commands. In (B), we illustrate how we can use the communication interface to connect Neurosim to a visualizer -- here Rerun RerunSDK -- and draw the simulated data in real-time. It displays the rendered RGB image, depth map, IMU readings, 6-DoF pose, navigation mesh, and the event stream. For visualization, events are accumulated over short time windows (20 ) and colored by polarity.
Figure 3: Neurosim event simulator backend is optimized to prevent bottlenecking of the full simulation pipeline. We benchmark Neurosim's event camera simulator against existing simulators: a CUDA-based simulator esim, a GPU-based PyTorch implementation, and a CPU-based implementation airsim2017fsr. A. We measure the average latency to simulate events from a single intensity image update -- the time taken to process a new intensity frame, trigger events based on the contrast threshold model, update the internal pixel states, and output the event stream. Neurosim achieves over 31 kHz for VGA and 23 kHz for HD frame sizes. It is roughly 8--13 $\times$ faster than other GPU-based implementations, and 55--121 $\times$ faster than CPU-based implementations. B. Neurosim using the custom event simulator backend achieves $\sim$2300 FPS for a full simulation step (including rendering, dynamics, and event simulation) with VGA vision sensors. Other event simulation backends bottleneck the full simulation pipeline, achieving only 200--1200 FPS. C. Neurosim's event simulator only takes up $\sim$ 8 % of the total simulation step time compared to 40--90 % for other event simulators. This high performance is attributed to the warp-synchronous CUDA kernel design that minimizes atomic operations during event aggregation and allows our event generation model to run in a single kernel launch.
Figure 4: Cortex provides high-throughput, low-latency, and scalable communication in Python. We benchmark different aspects of Cortex's performance on a modern desktop CPU (AMD Ryzen 9 7950X). A. We use NumPy float (32 ) arrays of different sizes ranging from 10 elements (40 ) to 1080$\times$1920 RGB images (23.7 ) as our message payloads. We measure the max achieved successful publish-subscribe rate (in ) for each payload size over a 15 interval. Cortex achieves 100+ for small ($<$ 39 ) messages and maintains sufficiently high rates (250 ) for large 1080p RGB images. B. We measure the achieved throughput (in ) for the same payload sizes. Cortex achieves up to 7 throughput for large messages, saturating the underlying hardware limits. For smaller messages, the throughput is limited by per-message overheads -- the messaging rate caps at $\sim$100 . C. We test the scalability of Cortex by varying the number of subscribers from 1 to 32, each receiving 40 messages from a single publisher. The achieved message rate per subscriber remains constant at around 100 as the number of subscribers increases up to 8, after which it degrades due to CPU contention from many busy-waiting subscribers on the same machine. D. We measure the achieved throughput (in ) for the same setup as in C. As deduced from the message rates, the throughput scales roughly linearly with the number of subscribers until CPU contention effects kick in beyond 8 subscribers.
Figure 5: Examples of a quadrotor tracking randomly generated MinSnap trajectories in a variety of indoor scenes. In each example, from left to right, we plot the color image, depth image (near to far is colored as dark blue to red), and events aggregated with polarity over 20 , on the top. 3D trajectory with current pose, navigation mesh with current position, and IMU readings are plotted at the bottom. The navigation mesh marks free space and non-reachable locations at the current flying height with gray and white, respectively. These scenes demonstrate fast flights across multiple floors, corridors, and interconnected rooms, with angular and linear velocities often exceeding 4 and 2 , respectively.
...and 1 more figures

Neurosim: A Fast Simulator for Neuromorphic Robot Perception

TL;DR

Abstract

Neurosim: A Fast Simulator for Neuromorphic Robot Perception

Authors

TL;DR

Abstract

Table of Contents

Figures (6)