Multi-Object Tracking in the Dark

Xinzhe Wang; Kang Ma; Qiankun Liu; Yunhao Zou; Ying Fu

Multi-Object Tracking in the Dark

Xinzhe Wang, Kang Ma, Qiankun Liu, Yunhao Zou, Ying Fu

TL;DR

This work tackles multi-object tracking in dark scenes by introducing the LMOT dataset, built with a dual-camera RAW system that yields aligned low-light and well-lit video pairs along with high-quality MOT annotations. It then presents LTrack, a tracking method that avoids heavy low-light enhancement and instead learns invariant, noise-robust features through adaptive low-pass downsampling and degradation suppression learning. Key contributions include the LMOT dataset design and statistics, the ALD and DSL components, and extensive experiments showing state-of-the-art performance in real night scenes. The approach offers practical impact for night-time autonomous driving and surveillance by providing a robust MOT pipeline that leverages RAW data and targeted feature regularization.

Abstract

Low-light scenes are prevalent in real-world applications (e.g. autonomous driving and surveillance at night). Recently, multi-object tracking in various practical use cases have received much attention, but multi-object tracking in dark scenes is rarely considered. In this paper, we focus on multi-object tracking in dark scenes. To address the lack of datasets, we first build a Low-light Multi-Object Tracking (LMOT) dataset. LMOT provides well-aligned low-light video pairs captured by our dual-camera system, and high-quality multi-object tracking annotations for all videos. Then, we propose a low-light multi-object tracking method, termed as LTrack. We introduce the adaptive low-pass downsample module to enhance low-frequency components of images outside the sensor noises. The degradation suppression learning strategy enables the model to learn invariant information under noise disturbance and image quality degradation. These components improve the robustness of multi-object tracking in dark scenes. We conducted a comprehensive analysis of our LMOT dataset and proposed LTrack. Experimental results demonstrate the superiority of the proposed method and its competitiveness in real night low-light scenes. Dataset and Code: https: //github.com/ying-fu/LMOT

Multi-Object Tracking in the Dark

TL;DR

Abstract

Paper Structure (16 sections, 4 equations, 6 figures, 10 tables)

This paper contains 16 sections, 4 equations, 6 figures, 10 tables.

Introduction
Related Work
Low-light Multi-object Tracking Dataset
Dataset Construction
Dataset Statistic
Low-light Multi-object Tracking
Formulation and Motivation
Adaptive Low-pass Downsampling
Degradation Suppression Learning
Experiments
Experiment Setup
Analysis under low-light conditions
Results on LMOT dataset
Results on Real World
Exploration and Discussion
...and 1 more sections

Figures (6)

Figure 1: Our dual-camera system. It consists of two cameras, a beam splitter, and an ND-filter. Two cameras of identical models are meticulously engineered to achieve pixel-by-pixel alignment in the captured video data.
Figure 2: Two example videos from our LMOT dataset. It provides well-aligned low-light video pairs and MOT annotations for all videos. The time interval between adjacent frames is $1s$. The first row is the low-light video, the second row is the scaled low-light video and the last row is the well-lit video. Our LMOT dataset is collected from city outdoor scenes.
Figure 3: (a) Number of instances per category. LMOT consists of 6 categories, most of the instances are the person and car. (b) IoU on adjacent frames. Compared to MOT17, KITTI, and DanceTrack, LMOT has a roughly average score. This indicates that LMOT has a relatively normal movement speed. (c) Cosine distance of appearance features. The cosine distance is smaller under low-light conditions, indicating that the appearance distinguishability is decreased under low-light conditions.
Figure 4: The overall framework of the proposed low-light multi-object tracking method, termed as LTrack. It employs adaptive low-pass downsample module and degradation suppression learning strategy, enabling the model to learn invariant features from low-light videos.
Figure 5: Visualization of shallow and deep features for well-lit and low-light images. It can be seen that, under low-light conditions, the shallow feature is full of noise, and the deep feature exhibits lower responses to objects.
...and 1 more figures

Multi-Object Tracking in the Dark

TL;DR

Abstract

Multi-Object Tracking in the Dark

Authors

TL;DR

Abstract

Table of Contents

Figures (6)