Efficient FPGA-accelerated Convolutional Neural Networks for Cloud Detection on CubeSats
Angela Cratere, M. Salim Farissi, Andrea Carbone, Marcello Asciolla, Maria Rizzi, Francesco Dell'Olio, Augusto Nascetti, Dario Spiller
TL;DR
The paper tackles onboard cloud detection for CubeSats by evaluating four CNN models (Pixel-Net, Patch-Net, Scene-Net, U-Net) deployed on Xilinx DPU hardware via Vitis AI on a Zynq UltraScale+ MPSoC. It demonstrates that channel pruning (up to 98.6% parameter reduction) and 8-bit quantization can greatly reduce compute and memory requirements with minimal accuracy loss (total drop ~0.6%), while enabling real-time inference for image-wise models (Scene-Net at 57.14 FPS and U-Net at 37.45 FPS) with power around 2.5 W. Pixel-Net and Patch-Net, though accurate, exhibit latency challenges for full-image processing, reinforcing the advantage of image-wise architectures for onboard cloud detection. The study highlights the viability of DPU-based accelerators for small satellites, offering a flexible, power-efficient path to deploy CNNs for onboard EO tasks and informing deployment strategies for future nanosatellite missions. Overall, the work provides a practical, scalable framework for FPGA-accelerated CNN deployment in resource-constrained space platforms, balancing model complexity, accuracy, and hardware constraints.
Abstract
We present the implementation of four FPGA-accelerated convolutional neural network (CNN) models for onboard cloud detection in resource-constrained CubeSat missions, leveraging Xilinx's Vitis AI (VAI) framework and Deep Learning Processing Unit (DPU), a programmable engine with pre-implemented, parameterizable IP cores optimized for deep neural networks, on a Zynq UltraScale+ MPSoC. This study explores both pixel-wise (Pixel-Net and Patch-Net) and image-wise (U-Net and Scene-Net) models to benchmark trade-offs in accuracy, latency, and model complexity. Applying channel pruning, we achieved substantial reductions in model parameters (up to 98.6%) and floating-point operations (up to 90.7%) with minimal accuracy loss. Furthermore, the VAI tool was used to quantize the models to 8-bit precision, ensuring optimized hardware performance with negligible impact on accuracy. All models retained high accuracy post-FPGA integration, with a cumulative maximum accuracy drop of only 0.6% after quantization and pruning. The image-wise Scene-Net and U-Net models demonstrated strong real-time inference capabilities, achieving frame rates per second of 57.14 and 37.45, respectively, with power consumption of around 2.5 W, surpassing state-of-the-art onboard cloud detection solutions. Our approach underscores the potential of DPU-based hardware accelerators to expand the processing capabilities of small satellites, enabling efficient and flexible onboard CNN-based applications.
