Table of Contents
Fetching ...

Efficient FPGA-accelerated Convolutional Neural Networks for Cloud Detection on CubeSats

Angela Cratere, M. Salim Farissi, Andrea Carbone, Marcello Asciolla, Maria Rizzi, Francesco Dell'Olio, Augusto Nascetti, Dario Spiller

TL;DR

The paper tackles onboard cloud detection for CubeSats by evaluating four CNN models (Pixel-Net, Patch-Net, Scene-Net, U-Net) deployed on Xilinx DPU hardware via Vitis AI on a Zynq UltraScale+ MPSoC. It demonstrates that channel pruning (up to 98.6% parameter reduction) and 8-bit quantization can greatly reduce compute and memory requirements with minimal accuracy loss (total drop ~0.6%), while enabling real-time inference for image-wise models (Scene-Net at 57.14 FPS and U-Net at 37.45 FPS) with power around 2.5 W. Pixel-Net and Patch-Net, though accurate, exhibit latency challenges for full-image processing, reinforcing the advantage of image-wise architectures for onboard cloud detection. The study highlights the viability of DPU-based accelerators for small satellites, offering a flexible, power-efficient path to deploy CNNs for onboard EO tasks and informing deployment strategies for future nanosatellite missions. Overall, the work provides a practical, scalable framework for FPGA-accelerated CNN deployment in resource-constrained space platforms, balancing model complexity, accuracy, and hardware constraints.

Abstract

We present the implementation of four FPGA-accelerated convolutional neural network (CNN) models for onboard cloud detection in resource-constrained CubeSat missions, leveraging Xilinx's Vitis AI (VAI) framework and Deep Learning Processing Unit (DPU), a programmable engine with pre-implemented, parameterizable IP cores optimized for deep neural networks, on a Zynq UltraScale+ MPSoC. This study explores both pixel-wise (Pixel-Net and Patch-Net) and image-wise (U-Net and Scene-Net) models to benchmark trade-offs in accuracy, latency, and model complexity. Applying channel pruning, we achieved substantial reductions in model parameters (up to 98.6%) and floating-point operations (up to 90.7%) with minimal accuracy loss. Furthermore, the VAI tool was used to quantize the models to 8-bit precision, ensuring optimized hardware performance with negligible impact on accuracy. All models retained high accuracy post-FPGA integration, with a cumulative maximum accuracy drop of only 0.6% after quantization and pruning. The image-wise Scene-Net and U-Net models demonstrated strong real-time inference capabilities, achieving frame rates per second of 57.14 and 37.45, respectively, with power consumption of around 2.5 W, surpassing state-of-the-art onboard cloud detection solutions. Our approach underscores the potential of DPU-based hardware accelerators to expand the processing capabilities of small satellites, enabling efficient and flexible onboard CNN-based applications.

Efficient FPGA-accelerated Convolutional Neural Networks for Cloud Detection on CubeSats

TL;DR

The paper tackles onboard cloud detection for CubeSats by evaluating four CNN models (Pixel-Net, Patch-Net, Scene-Net, U-Net) deployed on Xilinx DPU hardware via Vitis AI on a Zynq UltraScale+ MPSoC. It demonstrates that channel pruning (up to 98.6% parameter reduction) and 8-bit quantization can greatly reduce compute and memory requirements with minimal accuracy loss (total drop ~0.6%), while enabling real-time inference for image-wise models (Scene-Net at 57.14 FPS and U-Net at 37.45 FPS) with power around 2.5 W. Pixel-Net and Patch-Net, though accurate, exhibit latency challenges for full-image processing, reinforcing the advantage of image-wise architectures for onboard cloud detection. The study highlights the viability of DPU-based accelerators for small satellites, offering a flexible, power-efficient path to deploy CNNs for onboard EO tasks and informing deployment strategies for future nanosatellite missions. Overall, the work provides a practical, scalable framework for FPGA-accelerated CNN deployment in resource-constrained space platforms, balancing model complexity, accuracy, and hardware constraints.

Abstract

We present the implementation of four FPGA-accelerated convolutional neural network (CNN) models for onboard cloud detection in resource-constrained CubeSat missions, leveraging Xilinx's Vitis AI (VAI) framework and Deep Learning Processing Unit (DPU), a programmable engine with pre-implemented, parameterizable IP cores optimized for deep neural networks, on a Zynq UltraScale+ MPSoC. This study explores both pixel-wise (Pixel-Net and Patch-Net) and image-wise (U-Net and Scene-Net) models to benchmark trade-offs in accuracy, latency, and model complexity. Applying channel pruning, we achieved substantial reductions in model parameters (up to 98.6%) and floating-point operations (up to 90.7%) with minimal accuracy loss. Furthermore, the VAI tool was used to quantize the models to 8-bit precision, ensuring optimized hardware performance with negligible impact on accuracy. All models retained high accuracy post-FPGA integration, with a cumulative maximum accuracy drop of only 0.6% after quantization and pruning. The image-wise Scene-Net and U-Net models demonstrated strong real-time inference capabilities, achieving frame rates per second of 57.14 and 37.45, respectively, with power consumption of around 2.5 W, surpassing state-of-the-art onboard cloud detection solutions. Our approach underscores the potential of DPU-based hardware accelerators to expand the processing capabilities of small satellites, enabling efficient and flexible onboard CNN-based applications.

Paper Structure

This paper contains 15 sections, 4 figures, 6 tables.

Figures (4)

  • Figure 1: (a) Global distribution of the Sentinel-2 imagery utilized for training, validation and test datasets. (b) Distribution of the cloudy percentage of the 256$\times$256 tiles obtained from Sentinel-2 granules used for the training and validation dataset. Tiles with cloudy percentage $\ge$ 70% are labeled as cloudy in the construction of the dataset for the Scene-Net model.
  • Figure 2: Architectures of the four CNN models (Pixel-Net, Patch-Net, Scene-Net, and U-Net) for cloud detection. Each model processes inputs of varying spatial resolutions and consists of multiple layers, including convolution, max-pooling, flatten, dense, transposed convolution, and concatenate layers, each represented by different colors. The black numbers indicate the number of convolutional kernels in the baseline models, while the red numbers show the number of filters after applying channel pruning.
  • Figure 3: Overview of the CNN deployment strategy on FPGA using VAI and PYNQ frameworks.
  • Figure 4: FPGA segmentation outputs for five different regions (rows). The five columns show: (1) the false-color RGB image (using B11, B3, B2 bands); (2) the ground truth cloud mask derived from the Sentinel-2 Scene Classification Layer (SCL); (3), (4), (5) the FPGA prediction from Pixel-Net, Patch-Net, and U-Net, respectively.