Patch-based Selection and Refinement for Early Object Detection

Tianyi Zhang; Kishore Kasichainula; Yaoxin Zhuo; Baoxin Li; Jae-Sun Seo; Yu Cao

Patch-based Selection and Refinement for Early Object Detection

Tianyi Zhang, Kishore Kasichainula, Yaoxin Zhuo, Baoxin Li, Jae-Sun Seo, Yu Cao

TL;DR

This work proposes a novel set of algorithms that divide the image into patches, select patches with objects at various scales, elaborate the details of a small object, and detect it as early as possible to improve the detection accuracy.

Abstract

Early object detection (OD) is a crucial task for the safety of many dynamic systems. Current OD algorithms have limited success for small objects at a long distance. To improve the accuracy and efficiency of such a task, we propose a novel set of algorithms that divide the image into patches, select patches with objects at various scales, elaborate the details of a small object, and detect it as early as possible. Our approach is built upon a transformer-based network and integrates the diffusion model to improve the detection accuracy. As demonstrated on BDD100K, our algorithms enhance the mAP for small objects from 1.03 to 8.93, and reduce the data volume in computation by more than 77\%. The source code is available at \href{https://github.com/destiny301/dpr}{https://github.com/destiny301/dpr}

Patch-based Selection and Refinement for Early Object Detection

TL;DR

Abstract

Paper Structure (11 sections, 10 equations, 4 figures, 1 table, 2 algorithms)

This paper contains 11 sections, 10 equations, 4 figures, 1 table, 2 algorithms.

Introduction
Related work
Diffusion Models for Image SR
Object Detection (OD)
Methodology
Patch-Selector
Patch-Refiner
Patch-Organizer
Experiments
Dataset and Training Details
CDM for Patch Refinement

Figures (4)

Figure 1: (Left): Objects occupy only a small proportion of the entire image in this example of BDD100K dataset. (Right): With object pixels decreasing, the OD performance rapidly drops.
Figure 2: Overall architecture of DPR (Dichotomized Patch Refinement). By dividing all patches of the original image into two groups based on whether it contains objects or not before the image reconstruction, we leverage CDM to process only positive patches to reduce computation and improve the performance for the subsequent OD task since negative patches don't contribute to OD. There are two major components for training: Patch-Selector module with learnable parameters $\theta$, and CDM with parameters $\phi$.
Figure 3: The design of Patch-Selector Module. (a) Utilizing a hierarchical architecture encoder, input images are embedded into features at three different scales. Subsequently, patches within these features undergo classification and aggregation to form the final output. (b) Each Transformer Layer (TL) includes a feature merging block and multiple window-based self-attention blocks.
Figure :

Patch-based Selection and Refinement for Early Object Detection

TL;DR

Abstract

Patch-based Selection and Refinement for Early Object Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (4)