Table of Contents
Fetching ...

Distilling Channels for Efficient Deep Tracking

Shiming Ge, Zhao Luo, Chunhui Zhang, Yingying Hua, Dacheng Tao

TL;DR

This paper presents a novel and general framework termed channel distillation to facilitate deep trackers and demonstrates that an integrated formulation can turn feature compression, response map generation, and model update into a unified energy minimization problem to adaptively select informative feature channels that improve the efficacy of tracking moving objects on the fly.

Abstract

Deep trackers have proven success in visual tracking. Typically, these trackers employ optimally pre-trained deep networks to represent all diverse objects with multi-channel features from some fixed layers. The deep networks employed are usually trained to extract rich knowledge from massive data used in object classification and so they are capable to represent generic objects very well. However, these networks are too complex to represent a specific moving object, leading to poor generalization as well as high computational and memory costs. This paper presents a novel and general framework termed channel distillation to facilitate deep trackers. To validate the effectiveness of channel distillation, we take discriminative correlation filter (DCF) and ECO for example. We demonstrate that an integrated formulation can turn feature compression, response map generation, and model update into a unified energy minimization problem to adaptively select informative feature channels that improve the efficacy of tracking moving objects on the fly. Channel distillation can accurately extract good channels, alleviating the influence of noisy channels and generally reducing the number of channels, as well as adaptively generalizing to different channels and networks. The resulting deep tracker is accurate, fast, and has low memory requirements. Extensive experimental evaluations on popular benchmarks clearly demonstrate the effectiveness and generalizability of our framework.

Distilling Channels for Efficient Deep Tracking

TL;DR

This paper presents a novel and general framework termed channel distillation to facilitate deep trackers and demonstrates that an integrated formulation can turn feature compression, response map generation, and model update into a unified energy minimization problem to adaptively select informative feature channels that improve the efficacy of tracking moving objects on the fly.

Abstract

Deep trackers have proven success in visual tracking. Typically, these trackers employ optimally pre-trained deep networks to represent all diverse objects with multi-channel features from some fixed layers. The deep networks employed are usually trained to extract rich knowledge from massive data used in object classification and so they are capable to represent generic objects very well. However, these networks are too complex to represent a specific moving object, leading to poor generalization as well as high computational and memory costs. This paper presents a novel and general framework termed channel distillation to facilitate deep trackers. To validate the effectiveness of channel distillation, we take discriminative correlation filter (DCF) and ECO for example. We demonstrate that an integrated formulation can turn feature compression, response map generation, and model update into a unified energy minimization problem to adaptively select informative feature channels that improve the efficacy of tracking moving objects on the fly. Channel distillation can accurately extract good channels, alleviating the influence of noisy channels and generally reducing the number of channels, as well as adaptively generalizing to different channels and networks. The resulting deep tracker is accurate, fast, and has low memory requirements. Extensive experimental evaluations on popular benchmarks clearly demonstrate the effectiveness and generalizability of our framework.
Paper Structure (19 sections, 12 equations, 10 figures, 2 tables)

This paper contains 19 sections, 12 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Schematic of our tracking framework with channel distillation. Channel distillation adaptively selects good channels from a deep network pre-trained for object classification to represent diverse tracked objects, then the distilled features are fed to DCF or ECO, forming an integrated deep tracking formulation. The formulation addresses feature compression, response map generation and model update in a unified framework.
  • Figure 2: Examples of "good" channels in two video sequences selected from a convolutional layer. The 1st and 2nd rows visualize the multi-channel features in two consecutive frames. The 3rd row shows the selected good channels in white, which are spatially salient and temporally consistent. This implies that good channels for a specific tracked object exist in a video.
  • Figure 3: An example of visualizing the feature maps when integrating channel distillation into ECO. The pruned feature channels are marked with red rectangles.
  • Figure 4: Baseline comparison on OTB100. The measure, speed and the number of average feature channels used for each tracker are shown in the legend.
  • Figure 5: Baseline comparison under various circumstances on OTB100
  • ...and 5 more figures