Table of Contents
Fetching ...

Hierarchical Visual Categories Modeling: A Joint Representation Learning and Density Estimation Framework for Out-of-Distribution Detection

Jinglun Li, Xinyu Zhou, Pinxue Guo, Yixuan Sun, Yiwen Huang, Weifeng Ge, Wenqiang Zhang

TL;DR

Results demonstrate that the proposed method hasn’t weakened the discriminative ability of visual recognition models and keeps high efficiency in detecting out-of-distribution samples, and the visual representation has a competitive performance when compared with features learned by classical methods.

Abstract

Detecting out-of-distribution inputs for visual recognition models has become critical in safe deep learning. This paper proposes a novel hierarchical visual category modeling scheme to separate out-of-distribution data from in-distribution data through joint representation learning and statistical modeling. We learn a mixture of Gaussian models for each in-distribution category. There are many Gaussian mixture models to model different visual categories. With these Gaussian models, we design an in-distribution score function by aggregating multiple Mahalanobis-based metrics. We don't use any auxiliary outlier data as training samples, which may hurt the generalization ability of out-of-distribution detection algorithms. We split the ImageNet-1k dataset into ten folds randomly. We use one fold as the in-distribution dataset and the others as out-of-distribution datasets to evaluate the proposed method. We also conduct experiments on seven popular benchmarks, including CIFAR, iNaturalist, SUN, Places, Textures, ImageNet-O, and OpenImage-O. Extensive experiments indicate that the proposed method outperforms state-of-the-art algorithms clearly. Meanwhile, we find that our visual representation has a competitive performance when compared with features learned by classical methods. These results demonstrate that the proposed method hasn't weakened the discriminative ability of visual recognition models and keeps high efficiency in detecting out-of-distribution samples.

Hierarchical Visual Categories Modeling: A Joint Representation Learning and Density Estimation Framework for Out-of-Distribution Detection

TL;DR

Results demonstrate that the proposed method hasn’t weakened the discriminative ability of visual recognition models and keeps high efficiency in detecting out-of-distribution samples, and the visual representation has a competitive performance when compared with features learned by classical methods.

Abstract

Detecting out-of-distribution inputs for visual recognition models has become critical in safe deep learning. This paper proposes a novel hierarchical visual category modeling scheme to separate out-of-distribution data from in-distribution data through joint representation learning and statistical modeling. We learn a mixture of Gaussian models for each in-distribution category. There are many Gaussian mixture models to model different visual categories. With these Gaussian models, we design an in-distribution score function by aggregating multiple Mahalanobis-based metrics. We don't use any auxiliary outlier data as training samples, which may hurt the generalization ability of out-of-distribution detection algorithms. We split the ImageNet-1k dataset into ten folds randomly. We use one fold as the in-distribution dataset and the others as out-of-distribution datasets to evaluate the proposed method. We also conduct experiments on seven popular benchmarks, including CIFAR, iNaturalist, SUN, Places, Textures, ImageNet-O, and OpenImage-O. Extensive experiments indicate that the proposed method outperforms state-of-the-art algorithms clearly. Meanwhile, we find that our visual representation has a competitive performance when compared with features learned by classical methods. These results demonstrate that the proposed method hasn't weakened the discriminative ability of visual recognition models and keeps high efficiency in detecting out-of-distribution samples.
Paper Structure (12 sections, 8 equations, 6 figures, 5 tables)

This paper contains 12 sections, 8 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Illustration of the training pipeline of hierarchical visual category modeling, written as HVCM. In HVCM, we jointly learn the visual representation and parameters of probabilistic models. We get two different views of an input image and send the outputs into a knowledge distillation framework (DINO caron2021emerging). Image representations are projected into a high-dimensional attribute space. Then these attributes are divided into different groups and pass SoftMax functions to get attribute distributions. We match attributes in each group with stored attribute centers of the target visual category. The whole model is trained in an end-to-end manner.
  • Figure 2: Illustration of attribute group visualization and Mahalanobis distance distribution. Both attribute groups are visualized by t-SNE van2008visualizing. The colors encode different in-distribution data(ImageNet), and out-of-distribution features(SUN) marked as gray points. Models are trained on ResNet-50 he2016deep using DINO(a) and HVCM(b).
  • Figure 3: Each dataset exhibits displayed two categories of images. The leftmost samples belong to the InD dataset, while the categories on the right correspond to nine OOD datasets arranged in ascending order of distance. It can be observed that the gap between the OOD samples and InD samples gradually widens as the distance increases.
  • Figure 4: HVCM performance comparison as the increasing distances between InD and OOD data.
  • Figure 5: The performance of HVCM is evaluated as the number of InD classes increases across four OOD datasets.
  • ...and 1 more figures