Predictive Reasoning with Augmented Anomaly Contrastive Learning for Compositional Visual Relations

Chengtai Li; Yuting He; Jianfeng Ren; Ruibin Bai; Yitian Zhao; Heng Yu; Xudong Jiang

Predictive Reasoning with Augmented Anomaly Contrastive Learning for Compositional Visual Relations

Chengtai Li, Yuting He, Jianfeng Ren, Ruibin Bai, Yitian Zhao, Heng Yu, Xudong Jiang

TL;DR

A predict-and-verify paradigm is introduced for rule-based reasoning, in which a series of Predictive Anomaly Reasoning Blocks (PARBs) iteratively leverage features from three out of the four images to predict those of the remaining one.

Abstract

While visual reasoning for simple analogies has received significant attention, compositional visual relations (CVR) remain relatively unexplored due to their greater complexity. To solve CVR tasks, we propose Predictive Reasoning with Augmented Anomaly Contrastive Learning (PR-A$^2$CL), \ie, to identify an outlier image given three other images that follow the same compositional rules. To address the challenge of modelling abundant compositional rules, an Augmented Anomaly Contrastive Learning is designed to distil discriminative and generalizable features by maximizing similarity among normal instances while minimizing similarity between normal and anomalous outliers. More importantly, a predict-and-verify paradigm is introduced for rule-based reasoning, in which a series of Predictive Anomaly Reasoning Blocks (PARBs) iteratively leverage features from three out of the four images to predict those of the remaining one. Throughout the subsequent verification stage, the PARBs progressively pinpoint the specific discrepancies attributable to the underlying rules. Experimental results on SVRT, CVR and MC$^2$R datasets show that PR-A$^2$CL significantly outperforms state-of-the-art reasoning models.

Predictive Reasoning with Augmented Anomaly Contrastive Learning for Compositional Visual Relations

TL;DR

Abstract

CL), \ie, to identify an outlier image given three other images that follow the same compositional rules. To address the challenge of modelling abundant compositional rules, an Augmented Anomaly Contrastive Learning is designed to distil discriminative and generalizable features by maximizing similarity among normal instances while minimizing similarity between normal and anomalous outliers. More importantly, a predict-and-verify paradigm is introduced for rule-based reasoning, in which a series of Predictive Anomaly Reasoning Blocks (PARBs) iteratively leverage features from three out of the four images to predict those of the remaining one. Throughout the subsequent verification stage, the PARBs progressively pinpoint the specific discrepancies attributable to the underlying rules. Experimental results on SVRT, CVR and MC

R datasets show that PR-A

CL significantly outperforms state-of-the-art reasoning models.

Paper Structure (30 sections, 13 equations, 6 figures, 7 tables, 1 algorithm)

This paper contains 30 sections, 13 equations, 6 figures, 7 tables, 1 algorithm.

Introduction
Related Work
Abstract Visual Reasoning
Contrastive Learning
Proposed PR-A$^2$CL
Overview of Proposed PR-A$^2$CL
Visual Perception with A$^2$CL
Weak/Strong Data Augmentation
Augmented Anomaly Contrastive Learning
Predictive Anomaly Reasoning Module
Predict-and-Verify Paradigm
Predictive Anomaly Reasoning Block
Hierarchical PARB
Experimental Results
Experimental Settings
...and 15 more sections

Figures (6)

Figure 1: Left: A sample compositional rule in the CVR dataset. Right: Selecting an outlier from four images is converted to four predict-and-verify problems.
Figure 2: Proposed PR-A$^2$CL comprises two modules: 1) A Perception Module with Augmented Anomaly Contrastive Learning (A$^2$CL), generating weak and strong augmented views, encoding them via ResNet-50, and hence improving feature discriminability and generalization. 2) A Predictive Anomaly Reasoning Module (PARM) built with stacked PARBs. Each block employs a predict-and-verify mechanism, which predicts the target from context features and verifies it against the target to infer latent compositional rules.
Figure 3: Comparison with DBCR li2025dbcr on SVRT fleuret2011comparing using 1k samples per task. PR-A$^2$CL excels across most tasks, whereas DBCR struggles on several.
Figure 4: Comparison of reasoning accuracy on specific compositional rules. PR-A$^2$CL outperforms DBCR li2025dbcr for almost all rules.
Figure 5: t-SNE visualization of feature representations across successive PARBs, showing increased cluster compactness with deeper PARBs.
...and 1 more figures

Predictive Reasoning with Augmented Anomaly Contrastive Learning for Compositional Visual Relations

TL;DR

Abstract

Predictive Reasoning with Augmented Anomaly Contrastive Learning for Compositional Visual Relations

Authors

TL;DR

Abstract

Table of Contents

Figures (6)