Table of Contents
Fetching ...

Improving Object Detection for Time-Lapse Imagery Using Temporal Features in Wildlife Monitoring

Marcus Jenkins, Kirsty A. Franklin, Malcolm A. C. Nicoll, Nik C. Cole, Kevin Ruhomaun, Vikash Tatayah, Michal Mackiewicz

TL;DR

This work tackles object detection in time-lapse wildlife imagery by injecting temporal context into single-frame detectors. It introduces two temporal feature channels—the Temporal Average Background $T_{A_{12}}$ and the Difference Mask $D_M$—and explores both fixed and input-aware channel weighting to fuse these cues with RGB inputs in YOLOv7. A stratified, camera-based sampling strategy ensures robust generalization across unseen scenes and varying object sizes. The approach yields a 24% improvement in mean average precision over a single-frame baseline, with detailed analysis of channel weighting and the explicit benefit of the difference channel. The methodology is validated on a large seabird dataset (RI petrel), and the authors provide practical guidance on computational costs and data handling, highlighting the method's potential applicability to broader wildlife monitoring tasks.

Abstract

Monitoring animal populations is crucial for assessing the health of ecosystems. Traditional methods, which require extensive fieldwork, are increasingly being supplemented by time-lapse camera-trap imagery combined with an automatic analysis of the image data. The latter usually involves some object detector aimed at detecting relevant targets (commonly animals) in each image, followed by some postprocessing to gather activity and population data. In this paper, we show that the performance of an object detector in a single frame of a time-lapse sequence can be improved by including spatio-temporal features from the prior frames. We propose a method that leverages temporal information by integrating two additional spatial feature channels which capture stationary and non-stationary elements of the scene and consequently improve scene understanding and reduce the number of stationary false positives. The proposed technique achieves a significant improvement of 24\% in mean average precision (mAP@0.05:0.95) over the baseline (temporal feature-free, single frame) object detector on a large dataset of breeding tropical seabirds. We envisage our method will be widely applicable to other wildlife monitoring applications that use time-lapse imaging.

Improving Object Detection for Time-Lapse Imagery Using Temporal Features in Wildlife Monitoring

TL;DR

This work tackles object detection in time-lapse wildlife imagery by injecting temporal context into single-frame detectors. It introduces two temporal feature channels—the Temporal Average Background and the Difference Mask —and explores both fixed and input-aware channel weighting to fuse these cues with RGB inputs in YOLOv7. A stratified, camera-based sampling strategy ensures robust generalization across unseen scenes and varying object sizes. The approach yields a 24% improvement in mean average precision over a single-frame baseline, with detailed analysis of channel weighting and the explicit benefit of the difference channel. The methodology is validated on a large seabird dataset (RI petrel), and the authors provide practical guidance on computational costs and data handling, highlighting the method's potential applicability to broader wildlife monitoring tasks.

Abstract

Monitoring animal populations is crucial for assessing the health of ecosystems. Traditional methods, which require extensive fieldwork, are increasingly being supplemented by time-lapse camera-trap imagery combined with an automatic analysis of the image data. The latter usually involves some object detector aimed at detecting relevant targets (commonly animals) in each image, followed by some postprocessing to gather activity and population data. In this paper, we show that the performance of an object detector in a single frame of a time-lapse sequence can be improved by including spatio-temporal features from the prior frames. We propose a method that leverages temporal information by integrating two additional spatial feature channels which capture stationary and non-stationary elements of the scene and consequently improve scene understanding and reduce the number of stationary false positives. The proposed technique achieves a significant improvement of 24\% in mean average precision (mAP@0.05:0.95) over the baseline (temporal feature-free, single frame) object detector on a large dataset of breeding tropical seabirds. We envisage our method will be widely applicable to other wildlife monitoring applications that use time-lapse imaging.

Paper Structure

This paper contains 28 sections, 13 equations, 21 figures, 7 tables.

Figures (21)

  • Figure S1: An example annotated image from the RI petrel dataset.
  • Figure S2: Cont.
  • Figure :
  • Figure :
  • Figure S3: Comparison of the effect of colour correction on the difference mask, $D_M$. (a) Sample image from camera SWC3. (b) Corresponding $T_{A_{12}RGB}$ (before colour correction). (c) Corresponding $T'_{A_{12}RGB}$ (after colour correction). (d) $D_M$ using uncorrected $T_{A_{12}RGB}$. (e) $D_M$ using colour-corrected $T'_{A_{12}RGB}$.
  • ...and 16 more figures