PLEIADES: Building Temporal Kernels with Orthogonal Polynomials

Yan Ru Pei; Olivier Coenen

PLEIADES: Building Temporal Kernels with Orthogonal Polynomials

Yan Ru Pei, Olivier Coenen

TL;DR

PLEIADES addresses the challenge of capturing long-range temporal dependencies in online, event-based perception with memory-efficient kernels. By parameterizing temporal filters as weighted sums of orthogonal Jacobi polynomials, the method achieves long temporal receptive fields that can be resampled without fine-tuning, and is integrated into a lightweight spatiotemporal backbone with CenterNet-style heads. The approach yields state-of-the-art results across three event-based benchmarks (DVS128, AIS2024 3ET+, Prophesee GEN4) with remarkably small parameter counts and low memory/compute footprints, while maintaining causality for online inference. The work demonstrates the practical impact of structured temporal kernels for fast, accurate event-based perception and outlines clear paths for further efficiency gains and neuromorphic integrations.

Abstract

We introduce a class of neural networks named PLEIADES (PoLynomial Expansion In Adaptive Distributed Event-based Systems), which contains temporal convolution kernels generated from orthogonal polynomial basis functions. We focus on interfacing these networks with event-based data to perform online spatiotemporal classification and detection with low latency. By virtue of using structured temporal kernels and event-based data, we have the freedom to vary the sample rate of the data along with the discretization step-size of the network without additional finetuning. We experimented with three event-based benchmarks and obtained state-of-the-art results on all three by large margins with significantly smaller memory and compute costs. We achieved: 1) 99.59% accuracy with 192K parameters on the DVS128 hand gesture recognition dataset and 100% with a small additional output filter; 2) 99.58% test accuracy with 277K parameters on the AIS 2024 eye tracking challenge; and 3) 0.556 mAP with 576k parameters on the PROPHESEE 1 Megapixel Automotive Detection Dataset.

PLEIADES: Building Temporal Kernels with Orthogonal Polynomials

TL;DR

Abstract

Paper Structure (26 sections, 8 equations, 3 figures, 9 tables)

This paper contains 26 sections, 8 equations, 3 figures, 9 tables.

Introduction
Related Work
Long Temporal Convolutions and Parameterization of Kernels
Spatiotemporal Networks
Event-based Data and Networks
Temporal Convolutions with Polynomials
Building temporal kernels from orthogonal polynomials
Discretization of the convolution kernels
Optimal order of operations
Network Architecture
Experiments
DVS128 Hand Gesture Recognition
AIS2024 3ET+ Event-based Eye Tracking
Prophesee GEN4 Roadscene Object Detection
Limitations
...and 11 more sections

Figures (3)

Figure 1: Generating discrete temporal kernels for multiple channels, based on trainable coefficients and fixed basis orthogonal polynomials. Here, 3 temporal kernels one per channel, is generated from 4 basis polynomials discretized over 5 timebins. The shaded areas represent the discretized polynomial values. The kernel coefficients may be organized as a $3 \times 4$ matrix, and the discretized basis polynomials may be organized as a $4 \times 5$ matrix. The matrix multiplication of the two (contraction of coefficients) then yields the final discretized kernels for the 3 channels discretized over 5 timebins as a $3 \times 5$ matrix.
Figure 2: A representative network used for eye tracking. The backbone consists of 5 spatiotemporal blocks. Full convolutions are denoted by darker blue blocks (full conv), depthwise-separable convolution by lighter blocks (DWS conv). The detection head is inspired by CenterNet, with the modification that the $3\times 3$ convolution is made depthwise-separable and a temporal layer is prepended to it.
Figure 3: (Left) Accuracy vs. latency curves for different PLEIADES variants with a changing temporal window determined as a kernel size of 10 timebins but with different bin sizes on the DVS128 dataset. A masking augmentation is optionally used to randomly mask out the starting frames of dataset segments during training in order to stimulate faster responses in the network. (Right) Accuracy vs. latency curves for different PLEIADES variants with a fixed temporal window of 100 ms for each temporal layer, but having different bin sizes. The benchmark network is trained with a kernel size of 10 timebins and a 10 ms step size, and the other variants are resampled without additional fine-tuning. A network variant trained without PLEIADES structured temporal kernel is also displayed as a baseline reference (free kernels).

PLEIADES: Building Temporal Kernels with Orthogonal Polynomials

TL;DR

Abstract

PLEIADES: Building Temporal Kernels with Orthogonal Polynomials

Authors

TL;DR

Abstract

Table of Contents

Figures (3)