Table of Contents
Fetching ...

GlimmerNet: A Lightweight Grouped Dilated Depthwise Convolutions for UAV-Based Emergency Monitoring

Đorđe Nedeljković

TL;DR

GlimmerNet tackles the need for accurate yet ultra-lightweight on-board UAV perception for emergencies. It introduces Grouped Dilated Depthwise Convolutions (GDBlocks) to capture multi-scale context within a single, parameter-efficient pass, plus a lightweight Aggregator to fuse cross-group features. The approach achieves state-of-the-art weighted F1 on AIDERv2 with roughly 31k parameters and 29% fewer FLOPs than a recent baseline, demonstrating a favorable accuracy–efficiency balance for real-time UAV inference. Beyond UAV applications, a scaled-up variant shows generalization capability on TinyImageNet, indicating transferability of the grouped-dilated design to broader vision tasks. The work outlines practical directions for hardware-aware deployment and future multi-sensor fusion extensions.

Abstract

Convolutional Neural Networks (CNNs) have proven highly effective for edge and mobile vision tasks due to their computational efficiency. While many recent works seek to enhance CNNs with global contextual understanding via self-attention-based Vision Transformers, these approaches often introduce significant computational overhead. In this work, we demonstrate that it is possible to retain strong global perception without relying on computationally expensive components. We present GlimmerNet, an ultra-lightweight convolutional network built on the principle of separating receptive field diversity from feature recombination. GlimmerNet introduces Grouped Dilated Depthwise Convolutions(GDBlocks), which partition channels into groups with distinct dilation rates, enabling multi-scale feature extraction at no additional parameter cost. To fuse these features efficiently, we design a novel Aggregator module that recombines cross-group representations using grouped pointwise convolution, significantly lowering parameter overhead. With just 31K parameters and 29% fewer FLOPs than the most recent baseline, GlimmerNet achieves a new state-of-the-art weighted F1-score of 0.966 on the UAV-focused AIDERv2 dataset. These results establish a new accuracy-efficiency trade-off frontier for real-time emergency monitoring on resource-constrained UAV platforms. Our implementation is publicly available at https://github.com/djordjened92/gdd-cnn.

GlimmerNet: A Lightweight Grouped Dilated Depthwise Convolutions for UAV-Based Emergency Monitoring

TL;DR

GlimmerNet tackles the need for accurate yet ultra-lightweight on-board UAV perception for emergencies. It introduces Grouped Dilated Depthwise Convolutions (GDBlocks) to capture multi-scale context within a single, parameter-efficient pass, plus a lightweight Aggregator to fuse cross-group features. The approach achieves state-of-the-art weighted F1 on AIDERv2 with roughly 31k parameters and 29% fewer FLOPs than a recent baseline, demonstrating a favorable accuracy–efficiency balance for real-time UAV inference. Beyond UAV applications, a scaled-up variant shows generalization capability on TinyImageNet, indicating transferability of the grouped-dilated design to broader vision tasks. The work outlines practical directions for hardware-aware deployment and future multi-sensor fusion extensions.

Abstract

Convolutional Neural Networks (CNNs) have proven highly effective for edge and mobile vision tasks due to their computational efficiency. While many recent works seek to enhance CNNs with global contextual understanding via self-attention-based Vision Transformers, these approaches often introduce significant computational overhead. In this work, we demonstrate that it is possible to retain strong global perception without relying on computationally expensive components. We present GlimmerNet, an ultra-lightweight convolutional network built on the principle of separating receptive field diversity from feature recombination. GlimmerNet introduces Grouped Dilated Depthwise Convolutions(GDBlocks), which partition channels into groups with distinct dilation rates, enabling multi-scale feature extraction at no additional parameter cost. To fuse these features efficiently, we design a novel Aggregator module that recombines cross-group representations using grouped pointwise convolution, significantly lowering parameter overhead. With just 31K parameters and 29% fewer FLOPs than the most recent baseline, GlimmerNet achieves a new state-of-the-art weighted F1-score of 0.966 on the UAV-focused AIDERv2 dataset. These results establish a new accuracy-efficiency trade-off frontier for real-time emergency monitoring on resource-constrained UAV platforms. Our implementation is publicly available at https://github.com/djordjened92/gdd-cnn.

Paper Structure

This paper contains 26 sections, 3 equations, 5 figures, 9 tables.

Figures (5)

  • Figure 1: GlimmerNet: the initial Stem block reduces spatial dimensions, 4 Stage blocks extract representations effieciently, Refiner and Head allign features and perform classification.
  • Figure 2: GroupedDilatedDWConv - The grouped dilated depth-wise convolution: $k$ - kernel size, $g$ - group size, $d$ - dilation level. Each group has $c$ filters. Input tensor has shape $(h, w, m \cdot c)$, where $m$ is number of groups.
  • Figure 3: FeatureMapsRecomb - Get feature maps from the same index inside each of input $m$ groups and form new group. Therefore, there will be $c$ output groups with $m$ feature maps in each.
  • Figure 4: Class activation maps of the last Stage's Aggregator for image samples of the Fire class. Blue represents the lowest and red is for the highest values.
  • Figure 5: Feature maps per group of the last GroupedDilatedDWConv block in Stage 3, averaged along channels. Respective inputs for Earthquake, Fire, Flood and Normal class are on the left side.