Table of Contents
Fetching ...

VTR: An Optimized Vision Transformer for SAR ATR Acceleration on FPGA

Sachini Wickramasinghe, Dhruv Parikh, Bingyi Zhang, Rajgopal Kannan, Viktor Prasanna, Carl Busart

TL;DR

This work addresses SAR ATR under limited training data by introducing VTR, a lightweight Vision Transformer enhanced with Shifted Patch Tokenization and Locality Self-Attention. VTR can be trained directly on small SAR datasets without pretraining and is paired with a novel FPGA accelerator to enable real-time inference. Across MSTAR, SynthWakeSAR, and GBSAR, VTR achieves competitive or superior accuracy with substantially fewer parameters, while the FPGA core delivers major latency reductions and throughput improvements over CPU/GPU baselines. The combination enables deployable SAR ATR on resource-constrained platforms with real-time performance guarantees.

Abstract

Synthetic Aperture Radar (SAR) Automatic Target Recognition (ATR) is a key technique used in military applications like remote-sensing image recognition. Vision Transformers (ViTs) are the current state-of-the-art in various computer vision applications, outperforming their CNN counterparts. However, using ViTs for SAR ATR applications is challenging due to (1) standard ViTs require extensive training data to generalize well due to their low locality; the standard SAR datasets, however, have a limited number of labeled training data which reduces the learning capability of ViTs; (2) ViTs have a high parameter count and are computation intensive which makes their deployment on resource-constrained SAR platforms difficult. In this work, we develop a lightweight ViT model that can be trained directly on small datasets without any pre-training by utilizing the Shifted Patch Tokenization (SPT) and Locality Self-Attention (LSA) modules. We directly train this model on SAR datasets which have limited training samples to evaluate its effectiveness for SAR ATR applications. We evaluate our proposed model, that we call VTR (ViT for SAR ATR), on three widely used SAR datasets: MSTAR, SynthWakeSAR, and GBSAR. Further, we propose a novel FPGA accelerator for VTR, in order to enable deployment for real-time SAR ATR applications.

VTR: An Optimized Vision Transformer for SAR ATR Acceleration on FPGA

TL;DR

This work addresses SAR ATR under limited training data by introducing VTR, a lightweight Vision Transformer enhanced with Shifted Patch Tokenization and Locality Self-Attention. VTR can be trained directly on small SAR datasets without pretraining and is paired with a novel FPGA accelerator to enable real-time inference. Across MSTAR, SynthWakeSAR, and GBSAR, VTR achieves competitive or superior accuracy with substantially fewer parameters, while the FPGA core delivers major latency reductions and throughput improvements over CPU/GPU baselines. The combination enables deployable SAR ATR on resource-constrained platforms with real-time performance guarantees.

Abstract

Synthetic Aperture Radar (SAR) Automatic Target Recognition (ATR) is a key technique used in military applications like remote-sensing image recognition. Vision Transformers (ViTs) are the current state-of-the-art in various computer vision applications, outperforming their CNN counterparts. However, using ViTs for SAR ATR applications is challenging due to (1) standard ViTs require extensive training data to generalize well due to their low locality; the standard SAR datasets, however, have a limited number of labeled training data which reduces the learning capability of ViTs; (2) ViTs have a high parameter count and are computation intensive which makes their deployment on resource-constrained SAR platforms difficult. In this work, we develop a lightweight ViT model that can be trained directly on small datasets without any pre-training by utilizing the Shifted Patch Tokenization (SPT) and Locality Self-Attention (LSA) modules. We directly train this model on SAR datasets which have limited training samples to evaluate its effectiveness for SAR ATR applications. We evaluate our proposed model, that we call VTR (ViT for SAR ATR), on three widely used SAR datasets: MSTAR, SynthWakeSAR, and GBSAR. Further, we propose a novel FPGA accelerator for VTR, in order to enable deployment for real-time SAR ATR applications.
Paper Structure (30 sections, 7 equations, 6 figures, 6 tables)

This paper contains 30 sections, 7 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Overview
  • Figure 2: Model Architecture
  • Figure 3: Data Layout
  • Figure 4: Block diagram of the proposed HPPU
  • Figure 5: Element-wise Compute Unit (ECU) Operation
  • ...and 1 more figures