Uncertainty-Aware Perceiver
EuiYul Song
TL;DR
The paper argues that the Perceiver's lack of predictive uncertainty and limited generalization evidence weaken its claimed advantages. It introduces five Uncertainty-Aware Perceiver variants—Deep-, SWA-, Snap-, Fast-, and MC-Perceiver—to produce calibrated uncertainty estimates while preserving the model's scalable attention bottleneck. Empirical results on CIFAR-10 and CIFAR-100 show that several variants, especially Deep-Perceiver, achieve higher accuracy and better calibration than the baseline Perceiver, ViT, and ResNet-50, though MC-Perceiver may underperform on some datasets. The work demonstrates that uncertainty-aware extensions can enhance multimodal architectures without sacrificing scalability and outlines directions for pretraining and Bayesian enhancements to further improve uncertainty quantification.
Abstract
The Perceiver makes few architectural assumptions about the relationship among its inputs with quadratic scalability on its memory and computation time. Indeed, the Perceiver model outpaces or is competitive with ResNet-50 and ViT in terms of accuracy to some degree. However, the Perceiver does not take predictive uncertainty and calibration into account. The Perceiver also generalizes its performance on three datasets, three models, one evaluation metric, and one hyper-parameter setting. Worst of all, the Perceiver's relative performance improvement against other models is marginal. Furthermore, its reduction of architectural prior is not substantial; is not equivalent to its quality. Thereby, I invented five mutations of the Perceiver, the Uncertainty-Aware Perceivers, that obtain uncertainty estimates and measured their performance on three metrics. Experimented with CIFAR-10 and CIFAR-100, the Uncertainty-Aware Perceivers make considerable performance enhancement compared to the Perceiver.
