Table of Contents
Fetching ...

Scale-Invariant Object Detection by Adaptive Convolution with Unified Global-Local Context

Amrita Singh, Snehasis Mukherjee

TL;DR

This paper addresses the challenge of detecting small objects across scales by introducing SAC-Net, a Switchable Atrous Convolutional Network built on EfficientDet. It combines depthwise switchable atrous convolutions (DSAC) and a depthwise atrous with pointwise switchable conv (DAPSC) with global context blocks to preserve dense features while expanding receptive fields. Ablation studies on COCO show that global context plus DSAC and DAPSC yield measurable gains over state-of-the-art methods, validating the approach. The proposed framework offers a scalable, efficient path to improved multi-scale object detection and can be extended to video analysis or integrated with other detectors like YOLO.

Abstract

Dense features are important for detecting minute objects in images. Unfortunately, despite the remarkable efficacy of the CNN models in multi-scale object detection, CNN models often fail to detect smaller objects in images due to the loss of dense features during the pooling process. Atrous convolution addresses this issue by applying sparse kernels. However, sparse kernels often can lose the multi-scale detection efficacy of the CNN model. In this paper, we propose an object detection model using a Switchable (adaptive) Atrous Convolutional Network (SAC-Net) based on the efficientDet model. A fixed atrous rate limits the performance of the CNN models in the convolutional layers. To overcome this limitation, we introduce a switchable mechanism that allows for dynamically adjusting the atrous rate during the forward pass. The proposed SAC-Net encapsulates the benefits of both low-level and high-level features to achieve improved performance on multi-scale object detection tasks, without losing the dense features. Further, we apply a depth-wise switchable atrous rate to the proposed network, to improve the scale-invariant features. Finally, we apply global context on the proposed model. Our extensive experiments on benchmark datasets demonstrate that the proposed SAC-Net outperforms the state-of-the-art models by a significant margin in terms of accuracy.

Scale-Invariant Object Detection by Adaptive Convolution with Unified Global-Local Context

TL;DR

This paper addresses the challenge of detecting small objects across scales by introducing SAC-Net, a Switchable Atrous Convolutional Network built on EfficientDet. It combines depthwise switchable atrous convolutions (DSAC) and a depthwise atrous with pointwise switchable conv (DAPSC) with global context blocks to preserve dense features while expanding receptive fields. Ablation studies on COCO show that global context plus DSAC and DAPSC yield measurable gains over state-of-the-art methods, validating the approach. The proposed framework offers a scalable, efficient path to improved multi-scale object detection and can be extended to video analysis or integrated with other detectors like YOLO.

Abstract

Dense features are important for detecting minute objects in images. Unfortunately, despite the remarkable efficacy of the CNN models in multi-scale object detection, CNN models often fail to detect smaller objects in images due to the loss of dense features during the pooling process. Atrous convolution addresses this issue by applying sparse kernels. However, sparse kernels often can lose the multi-scale detection efficacy of the CNN model. In this paper, we propose an object detection model using a Switchable (adaptive) Atrous Convolutional Network (SAC-Net) based on the efficientDet model. A fixed atrous rate limits the performance of the CNN models in the convolutional layers. To overcome this limitation, we introduce a switchable mechanism that allows for dynamically adjusting the atrous rate during the forward pass. The proposed SAC-Net encapsulates the benefits of both low-level and high-level features to achieve improved performance on multi-scale object detection tasks, without losing the dense features. Further, we apply a depth-wise switchable atrous rate to the proposed network, to improve the scale-invariant features. Finally, we apply global context on the proposed model. Our extensive experiments on benchmark datasets demonstrate that the proposed SAC-Net outperforms the state-of-the-art models by a significant margin in terms of accuracy.
Paper Structure (20 sections, 7 equations, 4 figures, 2 tables)

This paper contains 20 sections, 7 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: shows an input image convolved with different atrous rate convolution filters. R is the Atrous rate.
  • Figure 2: Depthwise switchable atrous Conv layer with different atrous(DSAC). We convert each $3\times3$ convolutional layer in the baseline Efficientnet to DSAC, which gradually alternates the atrous rates used for convolutional computation. Two global context modules add image-level information to the features.
  • Figure 3: Depthwise atrous with pointwise switchable Conv layer(DAPSC) . We convert each $3\times3$ convolutional layer in the baseline Efficientnet to DAC, which gradually alternates the atrous rates used for convolutional computation and added a pointwise switch function(PSC). Two global context modules add image-level information to the features.
  • Figure 4: Global context before and after the depthwise convolution layers