PEFSL: A deployment Pipeline for Embedded Few-Shot Learning on a FPGA SoC
Lucas Grativol Ribeiro, Lubin Gauthier, Mathieu Leonardon, Jérémy Morlier, Antoine Lavrard-Meyer, Guillaume Muller, Virginie Fresse, Matthieu Arzel
TL;DR
The paper tackles the barrier of deploying few-shot learning on energy- and latency-constrained FPGA SoCs by delivering an end-to-end open-source pipeline built on the Tensil framework, along with a low-power demonstrator for real-time object classification. It selects compact ResNet backbones (notably ResNet-9) and conducts extensive design-space exploration across input resolutions, downsampling, and network width to meet embedded constraints on MiniImageNet. The key contributions include the PEFSL pipeline for training, ONNX export, RTL generation, and FPGA deployment; a demonstrator achieving around $30\mathrm{ms}$ latency at $6.2\ \mathrm{W}$ on a PYNQ-Z1, and an analysis showing favorable latency-accuracy trade-offs. This work enables rapid, open, on-device adaptation for robotics, drones, and autonomous systems where real-time, energy-efficient few-shot inference is essential.
Abstract
This paper tackles the challenges of implementing few-shot learning on embedded systems, specifically FPGA SoCs, a vital approach for adapting to diverse classification tasks, especially when the costs of data acquisition or labeling prove to be prohibitively high. Our contributions encompass the development of an end-to-end open-source pipeline for a few-shot learning platform for object classification on a FPGA SoCs. The pipeline is built on top of the Tensil open-source framework, facilitating the design, training, evaluation, and deployment of DNN backbones tailored for few-shot learning. Additionally, we showcase our work's potential by building and deploying a low-power, low-latency demonstrator trained on the MiniImageNet dataset with a dataflow architecture. The proposed system has a latency of 30 ms while consuming 6.2 W on the PYNQ-Z1 board.
