Table of Contents
Fetching ...

Shifted Autoencoders for Point Annotation Restoration in Object Counting

Yuda Zou, Xin Xiao, Peilin Zhou, Zhichao Sun, Bo Du, Yongchao Xu

TL;DR

The Shifted Autoencoders (SAE) is proposed, which enhances annotation consistency by applying random shifts to initial point annotations and employs a UNet to restore them to their original positions.

Abstract

Object counting typically uses 2D point annotations. The complexity of object shapes and the subjectivity of annotators may lead to annotation inconsistency, potentially confusing counting model training. Some sophisticated noise-resistance counting methods have been proposed to alleviate this issue. Differently, we aim to directly refine the initial point annotations before training counting models. For that, we propose the Shifted Autoencoders (SAE), which enhances annotation consistency. Specifically, SAE applies random shifts to initial point annotations and employs a UNet to restore them to their original positions. Similar to MAE reconstruction, the trained SAE captures general position knowledge and ignores specific manual offset noise. This allows to restore the initial point annotations to more general and thus consistent positions. Extensive experiments show that using such refined consistent annotations to train some advanced (including noise-resistance) object counting models steadily/significantly boosts their performances. Remarkably, the proposed SAE helps to set new records on nine datasets. We will make codes and refined point annotations available.

Shifted Autoencoders for Point Annotation Restoration in Object Counting

TL;DR

The Shifted Autoencoders (SAE) is proposed, which enhances annotation consistency by applying random shifts to initial point annotations and employs a UNet to restore them to their original positions.

Abstract

Object counting typically uses 2D point annotations. The complexity of object shapes and the subjectivity of annotators may lead to annotation inconsistency, potentially confusing counting model training. Some sophisticated noise-resistance counting methods have been proposed to alleviate this issue. Differently, we aim to directly refine the initial point annotations before training counting models. For that, we propose the Shifted Autoencoders (SAE), which enhances annotation consistency. Specifically, SAE applies random shifts to initial point annotations and employs a UNet to restore them to their original positions. Similar to MAE reconstruction, the trained SAE captures general position knowledge and ignores specific manual offset noise. This allows to restore the initial point annotations to more general and thus consistent positions. Extensive experiments show that using such refined consistent annotations to train some advanced (including noise-resistance) object counting models steadily/significantly boosts their performances. Remarkably, the proposed SAE helps to set new records on nine datasets. We will make codes and refined point annotations available.
Paper Structure (20 sections, 6 equations, 10 figures, 5 tables)

This paper contains 20 sections, 6 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Drawing inspiration from MAE, our SAE captures general positional knowledge by being trained to restore the shifted point annotations to their original positions. In the restoration phase, the trained SAE restores the initial point annotations to more common positions using the learned general positional knowledge.
  • Figure 2: Illustrative example of relative spatial distribution (approximating a Gaussian distribution wan2023modeling_NoiseCC_Tpami) of point annotations w.r.t. corresponding heads.
  • Figure 3: The pipeline of the proposed Shifted Autoencoders (SAE), consisting of three steps: 1) Shifted point generation by adding random shift vectors to the initial annotated points; 2) Training the SAE with generated shift vectors by restoring shifted points to their original positions based on predicted restoration vector field; 3) Self-restoration that shifts the originally annotated points with corresponding restoration vectors in the predicted vector field.
  • Figure 4: Illustration of different strategies to define the radius of sampling region for each annotated point in crowd images. See Eq. \ref{['eq: radius_with_only_distance']} and Eq. \ref{['eq: radius_with_distance_and_limitation']} and corresponding text for more details. The hyper-parameter $\alpha$ is set to 0.4. Best viewed by zooming in the electronic version.
  • Figure 5: Visualization of restored point annotation given by the proposed SAE. The bottom row provides a zoomed-in view within the white box of the top row. Green points: initial point annotations; Red points: revised point annotations; Yellow points: initial point annotation coincides with the corresponding revised point annotation.
  • ...and 5 more figures