BUSSARD: Normalizing Flows for Bijective Universal Scene-Specific Anomalous Relationship Detection

Melissa Schween; Mathis Kruse; Bodo Rosenhahn

BUSSARD: Normalizing Flows for Bijective Universal Scene-Specific Anomalous Relationship Detection

Melissa Schween, Mathis Kruse, Bodo Rosenhahn

Abstract

We propose Bijective Universal Scene-Specific Anomalous Relationship Detection (BUSSARD), a normalizing flow-based model for detecting anomalous relations in scene graphs, generated from images. Our work follows a multimodal approach, embedding object and relationship tokens from scene graphs with a language model to leverage semantic knowledge from the real world. A normalizing flow model is used to learn bijective transformations that map object-relation-object triplets from scene graphs to a simple base distribution (typically Gaussian), allowing anomaly detection through likelihood estimation. We evaluate our approach on the SARD dataset containing office and dining room scenes. Our method achieves around 10% better AUROC results compared to the current state-of-the-art model, while simultaneously being five times faster. Through ablation studies, we demonstrate superior robustness and universality, particularly regarding the use of synonyms, with our model maintaining stable performance while the baseline shows 17.5% deviation. This work demonstrates the strong potential of learning-based methods for relationship anomaly detection in scene graphs. Our code is available at https://github.com/mschween/BUSSARD .

BUSSARD: Normalizing Flows for Bijective Universal Scene-Specific Anomalous Relationship Detection

Abstract

Paper Structure (30 sections, 11 equations, 12 figures, 6 tables)

This paper contains 30 sections, 11 equations, 12 figures, 6 tables.

Introduction
Related Work
Scene Graph Generation (SGG)
Anomaly Detection (AD)
Foundations
Baseline Method
Metrics
Method
Scene Graph Generator
Word Embedding
Autoencoder
Normalizing Flow
Experiments
SARD Dataset
Implementation Details
...and 15 more sections

Figures (12)

Figure 1: Example image from SARD dataset lai2025scene. A non complete scene graph consists of: 'plate-on-chair', 'plate-near-clock', 'cup-on-table', 'chair-near-table'. The anomaly to detect is 'plate-on-chair'.
Figure 2: The components of BUSSARD. The images are parsed using a pretrained scene graph generator. Each triplet is then encoded using a pretrained word embedding model. The embeddings of the triplets are each concatenated and the dimension is reduced using an autoencoder. In the end, a normalizing flow is used to predict the likelihood of the triplets being anomalous.
Figure 3: The 40 most frequent triplets of the dining room scene. The labels belong to the highlighted bars, showing example triplets.
Figure 4: Ablation results with AUROC ($\uparrow$) and AUC-Recall@k ($\uparrow$) of BUSSARD and SARD-c for different synonym rates for the dining room scene. The synonym rate represents the probability of substituting words using synonym mappings. For BUSSARD the dots represent the average results after running with ten different seeds, and the shaded area visualizes the corresponding standard deviation. SARD-c was run only once for each rate as the calculation is deterministic.
Figure 5: Ablation results with the AUROC ($\uparrow$) for different latent space dimensions of the autoencoder.
...and 7 more figures

BUSSARD: Normalizing Flows for Bijective Universal Scene-Specific Anomalous Relationship Detection

Abstract

BUSSARD: Normalizing Flows for Bijective Universal Scene-Specific Anomalous Relationship Detection

Authors

Abstract

Table of Contents

Figures (12)