evMLP: An Efficient Event-Driven MLP Architecture for Vision

Zhentan Zheng

evMLP: An Efficient Event-Driven MLP Architecture for Vision

Zhentan Zheng

TL;DR

evMLP introduces an all-MLP vision model augmented with an event-driven local update that treats inter-frame changes as events and recomputes only patches where changes occur. By processing image patches independently and reusing computations for unchanged regions, it achieves competitive ImageNet accuracy (top-1 $=73.5\%$) at a compact $1.03$ GMACs and high inference throughput. In video scenarios, the event-driven mechanism yields average MAC reductions of $7$–$14\%$, with larger gains on stationary-camera data (over $25\%$ in some cases) and a tunable trade-off between efficiency and accuracy via the event threshold. These results suggest a practical, patch-level, MLP-based approach for real-time vision tasks, particularly in surveillance contexts where background stability is common.

Abstract

Deep neural networks have achieved remarkable results in computer vision tasks. In the early days, Convolutional Neural Networks (CNNs) were the mainstream architecture. In recent years, Vision Transformers (ViTs) have become increasingly popular. In addition, exploring applications of multi-layer perceptrons (MLPs) has provided new perspectives for research into vision model architectures. In this paper, we present evMLP accompanied by a simple event-driven local update mechanism. The proposed evMLP can independently process patches on images or feature maps via MLPs. We define changes between consecutive frames as ``events''. Under the event-driven local update mechanism, evMLP selectively processes patches where events occur. For sequential image data (e.g., video processing), this approach improves computational performance by avoiding redundant computations. Through ImageNet image classification experiments, evMLP attains accuracy competitive with state-of-the-art models. More significantly, experimental results on multiple video datasets demonstrate that evMLP reduces computational cost via its event-driven local update mechanism while maintaining output consistency with its non-event-driven baseline. The code and pre-trained models are available at https://github.com/i-evi/evMLP.

evMLP: An Efficient Event-Driven MLP Architecture for Vision

TL;DR

Abstract

evMLP: An Efficient Event-Driven MLP Architecture for Vision

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)