Table of Contents
Fetching ...

Redefining Normal: A Novel Object-Level Approach for Multi-Object Novelty Detection

Mohammadreza Salehi, Nikolaos Apostolikas, Efstratios Gavves, Cees G. M. Snoek, Yuki M. Asano

TL;DR

This work addresses semantic novelty detection in multi-object scenes by redefining the training normal as the predominant object. It introduces DEFEND, a dense feature fine-tuning stage that enforces object-level consistency across patches, and a guided masked knowledge distillation framework that trains a student from partial inputs using the teacher's attention guidance. Together, these components achieve state-of-the-art performance on multi-object benchmarks like Pascal VOC and COCO, while maintaining competitive results on single-object datasets and offering substantial inference efficiency gains. The approach emphasizes object-centric representations to bridge the gap between real-world, cluttered images and prior object-centric methods, providing a scalable solution for practical anomaly detection.

Abstract

In the realm of novelty detection, accurately identifying outliers in data without specific class information poses a significant challenge. While current methods excel in single-object scenarios, they struggle with multi-object situations due to their focus on individual objects. Our paper suggests a novel approach: redefining `normal' at the object level in training datasets. Rather than the usual image-level view, we consider the most dominant object in a dataset as the norm, offering a perspective that is more effective for real-world scenarios. Adapting to our object-level definition of `normal', we modify knowledge distillation frameworks, where a student network learns from a pre-trained teacher network. Our first contribution, DeFeND(Dense Feature Fine-tuning on Normal Data), integrates dense feature fine-tuning into the distillation process, allowing the teacher network to focus on object-level features with a self-supervised loss. The second is masked knowledge distillation, where the student network works with partially hidden inputs, honing its ability to deduce and generalize from incomplete data. This approach not only fares well in single-object novelty detection but also considerably surpasses existing methods in multi-object contexts. The implementation is available at: https://github.com/SMSD75/Redefining_Normal_ACCV24/tree/main

Redefining Normal: A Novel Object-Level Approach for Multi-Object Novelty Detection

TL;DR

This work addresses semantic novelty detection in multi-object scenes by redefining the training normal as the predominant object. It introduces DEFEND, a dense feature fine-tuning stage that enforces object-level consistency across patches, and a guided masked knowledge distillation framework that trains a student from partial inputs using the teacher's attention guidance. Together, these components achieve state-of-the-art performance on multi-object benchmarks like Pascal VOC and COCO, while maintaining competitive results on single-object datasets and offering substantial inference efficiency gains. The approach emphasizes object-centric representations to bridge the gap between real-world, cluttered images and prior object-centric methods, providing a scalable solution for practical anomaly detection.

Abstract

In the realm of novelty detection, accurately identifying outliers in data without specific class information poses a significant challenge. While current methods excel in single-object scenarios, they struggle with multi-object situations due to their focus on individual objects. Our paper suggests a novel approach: redefining `normal' at the object level in training datasets. Rather than the usual image-level view, we consider the most dominant object in a dataset as the norm, offering a perspective that is more effective for real-world scenarios. Adapting to our object-level definition of `normal', we modify knowledge distillation frameworks, where a student network learns from a pre-trained teacher network. Our first contribution, DeFeND(Dense Feature Fine-tuning on Normal Data), integrates dense feature fine-tuning into the distillation process, allowing the teacher network to focus on object-level features with a self-supervised loss. The second is masked knowledge distillation, where the student network works with partially hidden inputs, honing its ability to deduce and generalize from incomplete data. This approach not only fares well in single-object novelty detection but also considerably surpasses existing methods in multi-object contexts. The implementation is available at: https://github.com/SMSD75/Redefining_Normal_ACCV24/tree/main

Paper Structure

This paper contains 21 sections, 4 equations, 3 figures, 12 tables.

Figures (3)

  • Figure 1: A new setting and new method. On the left, the likelihood of object existence for different datasets is shown. COCO, as opposed to MNIST and CIFAR-10, shows less object-centric biases. In the middle, we introduce a multi-object novelty detection setting, where we define the 'normal' class as the predominant object in the dataset. In contrast to previous object-centric datasets, images can include objects of other categories (e.g., a human or a sheep for the dog class). On the right, we introduce a novel method that not only obtains the state-of-the-art in this setting but also excels in the classic single-object and object-centric settings.
  • Figure 2: The proposed method overview. In the first stage, the last two layers of the pre-trained feature are fine-tuned on the inputs using a dense self-supervised loss to provide consistent spatial features, output features except the CLS token, for different object views. This is done by projecting the spatial features by a shared MLP head and making the corresponding features similar while avoiding trivial solutions from happening. In the second stage, the knowledge distillation framework is employed, except the input is masked by the guidance of the teacher's attention map. The student's input is masked over the most informative regions by 50%. Finally, the discrepancy between the student and teacher is used for the novelty detection task at the test time.
  • Figure 3: Qualitative comparison of the proposed method and KDAD. As shown, the proposed method focuses more on the abnormal object to produce the anomaly score for each image.