Table of Contents
Fetching ...

Bridging the Vision-Brain Gap with an Uncertainty-Aware Blur Prior

Haitao Wu, Qing Li, Changqing Zhang, Zhen He, Xiaomin Ying

TL;DR

This work tackles the gap between visual stimuli and brain signals by identifying two GAPs: System GAP (loss of high-frequency details) and Random GAP (dynamic perceptual/cognitive processes and noise). It introduces Uncertainty-aware Blur Prior (UBP), blending Gaussian blur and uncertainty-driven radius adjustments to reduce mismatch and improve cross-modal alignment in a vision-brain contrastive learning setup. Using RSVP-based THINGS-EEG/THINGS-MEG data and CLIP-based visual encoders, UBP achieves state-of-the-art zero-shot brain-to-image retrieval, with notable improvements in Top-1 and Top-5 accuracies and robustness to subject variability. The approach offers a practical avenue for more reliable brain-computer interfaces and motivates uncertainty-aware priors for broader multimodal learning tasks.

Abstract

Can our brain signals faithfully reflect the original visual stimuli, even including high-frequency details? Although human perceptual and cognitive capacities enable us to process and remember visual information, these abilities are constrained by several factors, such as limited attentional resources and the finite capacity of visual memory. When visual stimuli are processed by human visual system into brain signals, some information is inevitably lost, leading to a discrepancy known as the \textbf{System GAP}. Additionally, perceptual and cognitive dynamics, along with technical noise in signal acquisition, degrade the fidelity of brain signals relative to the visual stimuli, known as the \textbf{Random GAP}. When encoded brain representations are directly aligned with the corresponding pretrained image features, the System GAP and Random GAP between paired data challenge the model, requiring it to bridge these gaps. However, in the context of limited paired data, these gaps are difficult for the model to learn, leading to overfitting and poor generalization to new data. To address these GAPs, we propose a simple yet effective approach called the \textbf{Uncertainty-aware Blur Prior (UBP)}. It estimates the uncertainty within the paired data, reflecting the mismatch between brain signals and visual stimuli. Based on this uncertainty, UBP dynamically blurs the high-frequency details of the original images, reducing the impact of the mismatch and improving alignment. Our method achieves a top-1 accuracy of \textbf{50.9\%} and a top-5 accuracy of \textbf{79.7\%} on the zero-shot brain-to-image retrieval task, surpassing previous state-of-the-art methods by margins of \textbf{13.7\%} and \textbf{9.8\%}, respectively. Code is available at \href{https://github.com/HaitaoWuTJU/Uncertainty-aware-Blur-Prior}{GitHub}.

Bridging the Vision-Brain Gap with an Uncertainty-Aware Blur Prior

TL;DR

This work tackles the gap between visual stimuli and brain signals by identifying two GAPs: System GAP (loss of high-frequency details) and Random GAP (dynamic perceptual/cognitive processes and noise). It introduces Uncertainty-aware Blur Prior (UBP), blending Gaussian blur and uncertainty-driven radius adjustments to reduce mismatch and improve cross-modal alignment in a vision-brain contrastive learning setup. Using RSVP-based THINGS-EEG/THINGS-MEG data and CLIP-based visual encoders, UBP achieves state-of-the-art zero-shot brain-to-image retrieval, with notable improvements in Top-1 and Top-5 accuracies and robustness to subject variability. The approach offers a practical avenue for more reliable brain-computer interfaces and motivates uncertainty-aware priors for broader multimodal learning tasks.

Abstract

Can our brain signals faithfully reflect the original visual stimuli, even including high-frequency details? Although human perceptual and cognitive capacities enable us to process and remember visual information, these abilities are constrained by several factors, such as limited attentional resources and the finite capacity of visual memory. When visual stimuli are processed by human visual system into brain signals, some information is inevitably lost, leading to a discrepancy known as the \textbf{System GAP}. Additionally, perceptual and cognitive dynamics, along with technical noise in signal acquisition, degrade the fidelity of brain signals relative to the visual stimuli, known as the \textbf{Random GAP}. When encoded brain representations are directly aligned with the corresponding pretrained image features, the System GAP and Random GAP between paired data challenge the model, requiring it to bridge these gaps. However, in the context of limited paired data, these gaps are difficult for the model to learn, leading to overfitting and poor generalization to new data. To address these GAPs, we propose a simple yet effective approach called the \textbf{Uncertainty-aware Blur Prior (UBP)}. It estimates the uncertainty within the paired data, reflecting the mismatch between brain signals and visual stimuli. Based on this uncertainty, UBP dynamically blurs the high-frequency details of the original images, reducing the impact of the mismatch and improving alignment. Our method achieves a top-1 accuracy of \textbf{50.9\%} and a top-5 accuracy of \textbf{79.7\%} on the zero-shot brain-to-image retrieval task, surpassing previous state-of-the-art methods by margins of \textbf{13.7\%} and \textbf{9.8\%}, respectively. Code is available at \href{https://github.com/HaitaoWuTJU/Uncertainty-aware-Blur-Prior}{GitHub}.

Paper Structure

This paper contains 29 sections, 10 equations, 13 figures, 24 tables, 1 algorithm.

Figures (13)

  • Figure 1: Overview of the information flow during Rapid Serial Visual Presentation (RSVP) and the GAPs in human visual perception and cognition. The top panel illustrates the RSVP paradigm, where a sequence of images is rapidly presented for 100ms each, with a fixation point in the center. The bottom panel highlights the GAPs in the visual processing pipeline: System Gap, which represents the loss of high-frequency details during the transition from raw visual stimuli to visual perception, and Random Gap, which arises due to (a) dynamic perceptual processes (e.g., shifts in visual attention), (b) dynamic cognitive processes (e.g., associating with similar objects or concepts), and (c) low-level technical noise in signal collection.
  • Figure 2: Illustration of brain signals. (a) EEG signals recorded over 80 trials of the same stimulus for Subject 1. The red line indicates the mean across all trials. (b) EEG signals from 80 trials of two stimuli for Subject 1. Cool colors represent Stimulus 1, warm colors represent Stimulus 2. The blue and red lines show the means for Stimulus 1 and Stimulus 2, respectively. (c) Density distribution of EEG signal variability across 10 subjects. Variability is negatively correlated with task performance and see \ref{['tab:correlation']} for further details. (d) UMAP projection of EEG signals from 10 subjects, showing distinct clustering patterns.
  • Figure 3: Semantic similarity visualization. (a) Semantic similarity matrix between image features and EEG features. The diagonal represents the similarity between corresponding pairs of features from the two modalities. (b) Density distribution of similarity scores from the diagonal of the matrix. The green dashed lines denote the confidence interval at a significance level of $1 - \alpha$, indicating the range of similarity scores that are statistically significant. The red areas represent the Uncertainty Area, indicating scores outside the confidence interval.
  • Figure 4: Comparison of Top-1 and Top-5 accuracy (%) for Intra-subject task on THINGS-EEG.
  • Figure 5: Illustration of various stimuli augmentations and corruptions applied to the visual stimuli. The augmentations (Flip, Crop, Grayscale, Color jitter) modify geometric properties or color distributions, while the corruptions (Gaussian noise, Low resolution, Uniform blur, Fovea blur) degrade image quality by introducing noise, lowering resolution, or simulating optical distortions.
  • ...and 8 more figures