Table of Contents
Fetching ...

Fractal Calibration for long-tailed object detection

Konstantinos Panagiotis Alexandridis, Ismail Elezi, Jiankang Deng, Anh Nguyen, Shan Luo

TL;DR

FRACAL addresses long-tailed object detection by introducing a space-aware logit adjustment that leverages class fractal dimension to model $p_s(y,u)$ and its test-time counterpart. It combines a standard frequency-based calibration with a fractal-informed spatial prior, enabling inference-time reweighting without retraining and compatible with one-stage and two-stage detectors. The method achieves state-of-the-art on LVIS, particularly boosting rare-class performance by up to 8.6%, and generalizes well to COCO, V3Det, and OpenImages. The work highlights the importance of class-location dependence and space information in calibration for robust long-tailed detection.

Abstract

Real-world datasets follow an imbalanced distribution, which poses significant challenges in rare-category object detection. Recent studies tackle this problem by developing re-weighting and re-sampling methods, that utilise the class frequencies of the dataset. However, these techniques focus solely on the frequency statistics and ignore the distribution of the classes in image space, missing important information. In contrast to them, we propose FRActal CALibration (FRACAL): a novel post-calibration method for long-tailed object detection. FRACAL devises a logit adjustment method that utilises the fractal dimension to estimate how uniformly classes are distributed in image space. During inference, it uses the fractal dimension to inversely downweight the probabilities of uniformly spaced class predictions achieving balance in two axes: between frequent and rare categories, and between uniformly spaced and sparsely spaced classes. FRACAL is a post-processing method and it does not require any training, also it can be combined with many off-the-shelf models such as one-stage sigmoid detectors and two-stage instance segmentation models. FRACAL boosts the rare class performance by up to 8.6% and surpasses all previous methods on LVIS dataset, while showing good generalisation to other datasets such as COCO, V3Det and OpenImages. We provide the code at https://github.com/kostas1515/FRACAL.

Fractal Calibration for long-tailed object detection

TL;DR

FRACAL addresses long-tailed object detection by introducing a space-aware logit adjustment that leverages class fractal dimension to model and its test-time counterpart. It combines a standard frequency-based calibration with a fractal-informed spatial prior, enabling inference-time reweighting without retraining and compatible with one-stage and two-stage detectors. The method achieves state-of-the-art on LVIS, particularly boosting rare-class performance by up to 8.6%, and generalizes well to COCO, V3Det, and OpenImages. The work highlights the importance of class-location dependence and space information in calibration for robust long-tailed detection.

Abstract

Real-world datasets follow an imbalanced distribution, which poses significant challenges in rare-category object detection. Recent studies tackle this problem by developing re-weighting and re-sampling methods, that utilise the class frequencies of the dataset. However, these techniques focus solely on the frequency statistics and ignore the distribution of the classes in image space, missing important information. In contrast to them, we propose FRActal CALibration (FRACAL): a novel post-calibration method for long-tailed object detection. FRACAL devises a logit adjustment method that utilises the fractal dimension to estimate how uniformly classes are distributed in image space. During inference, it uses the fractal dimension to inversely downweight the probabilities of uniformly spaced class predictions achieving balance in two axes: between frequent and rare categories, and between uniformly spaced and sparsely spaced classes. FRACAL is a post-processing method and it does not require any training, also it can be combined with many off-the-shelf models such as one-stage sigmoid detectors and two-stage instance segmentation models. FRACAL boosts the rare class performance by up to 8.6% and surpasses all previous methods on LVIS dataset, while showing good generalisation to other datasets such as COCO, V3Det and OpenImages. We provide the code at https://github.com/kostas1515/FRACAL.

Paper Structure

This paper contains 14 sections, 11 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: Previous works used class information $p_s(y)$ to align the learned source distribution $p_s(y,u|x)$ with the balanced target distribution $p_t(y,u|x)$, without considering the space $u$ and class $y$ relationship i.e. $p_s(y,u)$. FRACAL captures $p_s(y,u)$, using the fractal dimension, and embeds fractal margins during inference, aligning the learned distribution $p_s(y,u|x)$ with the target $p_t(y,u|x)$ better than previous works.
  • Figure 2: During imbalanced object detection, the model makes more frequent class detections like hat and less rare class detections like tiara both of which have strong upper location bias. FRACAL utilises fractal dimension and debiases the logits both in the frequency and space axes, making fewer hat detections and more tiara detections that are both evenly spread in image space.
  • Figure 3: Different grid sizes affect the object distribution estimation. When the grid is coarse, e.g., $1\times1$ or $2\times2$, there is no or little location information. When it is finer, e.g., $64\times64$, the probability is sparse, giving noisy estimates for the rare classes.
  • Figure 4: a) An example of the box counting method for the class cow. It iteratively counts the boxes $\nu$ containing its center, as $G$ grows. b-c) The blue points are all $G-\nu$ pairs, out of them only the orange points are used to calculate the slope $\Phi$ based on the quadratic rule $G = \lfloor \sqrt{n_y} \rfloor$. d-e) Fractal dimension and class frequency are weakly correlated, showing that the $\Phi$ complements the frequency statistic.
  • Figure 5: Detection results in LVIS, FRACAL detects more uniformly in both frequency and space axis compared to the baseline.