SAGE: Saliency-Guided Contrastive Embeddings

Colton R. Crum; Christopher Sweet; Adam Czajka

SAGE: Saliency-Guided Contrastive Embeddings

Colton R. Crum, Christopher Sweet, Adam Czajka

TL;DR

SAGE introduces Saliency-Guided Contrastive Embeddings, a training framework that shifts human saliency guidance from image space into the model's embedding space. It uses saliency-based augmentations, logit alignment with Jensen-Shannon divergence and temperature scaling, and a contrastive triplet loss on embeddings to steer the network toward salient features while avoiding non-salient ones. The method achieves state-of-the-art open-set performance across Iris POD, Chest X-ray anomalies, and synthetic-face detection, demonstrating robust generalization across backbones including CNNs and vision transformers. Sanity checks with inverted saliency maps confirm that performance gains stem from genuine saliency guidance rather than mere regularization, and ablations identify optimal temperature and augmentation settings. Overall, SAGE offers a scalable, architecture-agnostic approach to integrate human perceptual priors into deep learning for high-risk tasks.

Abstract

Integrating human perceptual priors into the training of neural networks has been shown to raise model generalization, serve as an effective regularizer, and align models with human expertise for applications in high-risk domains. Existing approaches to integrate saliency into model training often rely on internal model mechanisms, which recent research suggests may be unreliable. Our insight is that many challenges associated with saliency-guided training stem from the placement of the guidance approaches solely within the image space. Instead, we move away from the image space, use the model's latent space embeddings to steer human guidance during training, and we propose SAGE (Saliency-Guided Contrastive Embeddings): a loss function that integrates human saliency into network training using contrastive embeddings. We apply salient-preserving and saliency-degrading signal augmentations to the input and capture the changes in embeddings and model logits. We guide the model towards salient features and away from non-salient features using a contrastive triplet loss. Additionally, we perform a sanity check on the logit distributions to ensure that the model outputs match the saliency-based augmentations. We demonstrate a boost in classification performance across both open- and closed-set scenarios against SOTA saliency-based methods, showing SAGE's effectiveness across various backbones, and include experiments to suggest its wide generalization across tasks.

SAGE: Saliency-Guided Contrastive Embeddings

TL;DR

Abstract

SAGE: Saliency-Guided Contrastive Embeddings

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)