Table of Contents
Fetching ...

Real-Time Semantic Segmentation of Aerial Images Using an Embedded U-Net: A Comparison of CPU, GPU, and FPGA Workflows

Julien Posso, Hugo Kieffer, Nicolas Menga, Omar Hlimi, Sébastien Tarris, Hubert Guerard, Guy Bois, Matthieu Couderc, Eric Jenn

TL;DR

This work targets real-time, on-board semantic segmentation of aerial imagery by presenting a compact U-Net that reduces parameters and MACs by about 16x while preserving accuracy on the Inria dataset. It assesses five deployment workflows across CPU, GPU, and FPGA targets, using five toolchains (TensorFlow GPU, cuDNN, TVM, FINN, Vitis-AI) and evaluates them on metrics including IoU, accuracy, latency, power, and memory. The study finds FPGA with Vitis-AI to be the most favorable in terms of performance and energy efficiency, albeit with a steep learning curve and specialized hardware knowledge requirement, while CPU and GPU options offer maturity and ease of development with varying energy profiles. The results provide practical guidance for embedding semantic segmentation in UAV/satellite systems, highlighting trade-offs between throughput, memory, energy, and engineering effort across platforms and toolchains.

Abstract

This study introduces a lightweight U-Net model optimized for real-time semantic segmentation of aerial images, targeting the efficient utilization of Commercial Off-The-Shelf (COTS) embedded computing platforms. We maintain the accuracy of the U-Net on a real-world dataset while significantly reducing the model's parameters and Multiply-Accumulate (MAC) operations by a factor of 16. Our comprehensive analysis covers three hardware platforms (CPU, GPU, and FPGA) and five different toolchains (TVM, FINN, Vitis AI, TensorFlow GPU, and cuDNN), assessing each on metrics such as latency, power consumption, memory footprint, energy efficiency, and FPGA resource usage. The results highlight the trade-offs between these platforms and toolchains, with a particular focus on the practical deployment challenges in real-world applications. Our findings demonstrate that while the FPGA with Vitis AI emerges as the superior choice due to its performance, energy efficiency, and maturity, it requires specialized hardware knowledge, emphasizing the need for a balanced approach in selecting embedded computing solutions for semantic segmentation tasks

Real-Time Semantic Segmentation of Aerial Images Using an Embedded U-Net: A Comparison of CPU, GPU, and FPGA Workflows

TL;DR

This work targets real-time, on-board semantic segmentation of aerial imagery by presenting a compact U-Net that reduces parameters and MACs by about 16x while preserving accuracy on the Inria dataset. It assesses five deployment workflows across CPU, GPU, and FPGA targets, using five toolchains (TensorFlow GPU, cuDNN, TVM, FINN, Vitis-AI) and evaluates them on metrics including IoU, accuracy, latency, power, and memory. The study finds FPGA with Vitis-AI to be the most favorable in terms of performance and energy efficiency, albeit with a steep learning curve and specialized hardware knowledge requirement, while CPU and GPU options offer maturity and ease of development with varying energy profiles. The results provide practical guidance for embedding semantic segmentation in UAV/satellite systems, highlighting trade-offs between throughput, memory, energy, and engineering effort across platforms and toolchains.

Abstract

This study introduces a lightweight U-Net model optimized for real-time semantic segmentation of aerial images, targeting the efficient utilization of Commercial Off-The-Shelf (COTS) embedded computing platforms. We maintain the accuracy of the U-Net on a real-world dataset while significantly reducing the model's parameters and Multiply-Accumulate (MAC) operations by a factor of 16. Our comprehensive analysis covers three hardware platforms (CPU, GPU, and FPGA) and five different toolchains (TVM, FINN, Vitis AI, TensorFlow GPU, and cuDNN), assessing each on metrics such as latency, power consumption, memory footprint, energy efficiency, and FPGA resource usage. The results highlight the trade-offs between these platforms and toolchains, with a particular focus on the practical deployment challenges in real-world applications. Our findings demonstrate that while the FPGA with Vitis AI emerges as the superior choice due to its performance, energy efficiency, and maturity, it requires specialized hardware knowledge, emphasizing the need for a balanced approach in selecting embedded computing solutions for semantic segmentation tasks

Paper Structure

This paper contains 35 sections, 8 figures, 14 tables.

Figures (8)

  • Figure 1: IoU on the validation set vs. the number of parameters of the U-Net. Circle size represents the number of channels.
  • Figure 2: Detailed architecture of the U-Net model
  • Figure 3: Qualitative evaluation of our Float32 Keras lightweight U-Net on a 256x256 image of the validation set
  • Figure 4: GPU workflow from Keras/TensorFlow training to Nvidia Jetson AGX Xavier inference using TensorFlow
  • Figure 5: GPU workflow from Keras/TensorFlow training to Nvidia Jetson AGX Xavier inference with cuDNN
  • ...and 3 more figures