From Cognition to Computation: A Comparative Review of Human Attention and Transformer Architectures
Minglu Zhao, Dehong Xu, Tao Gao
TL;DR
The paper investigates how human attention and Transformer attention align and diverge, focusing on capacity constraints, attentional pathways, and intentional control. It frames the Transformer as implementing a self-attention mechanism and extends to multi-head attention, with explicit formulas $Attention(Q,K,V)=softmax((QK^T)/sqrt(d_k))V$ and $MultiHead(Q,K,V)=Concat(head_1,...,head_h)W^O$, where each head $head_i=Attention(QW^Q_i,KW^K_i,VW^V_i)$. The authors present a structured comparative analysis across vision, language, and agency, identifying similarities in selective attention and contextual integration, but highlighting important differences in resource limits and agency. They argue for interdisciplinary exploration to derive resource-aware, interpretable representations and potentially explicit agency mechanisms in AI.
Abstract
Attention is a cornerstone of human cognition that facilitates the efficient extraction of information in everyday life. Recent developments in artificial intelligence like the Transformer architecture also incorporate the idea of attention in model designs. However, despite the shared fundamental principle of selectively attending to information, human attention and the Transformer model display notable differences, particularly in their capacity constraints, attention pathways, and intentional mechanisms. Our review aims to provide a comparative analysis of these mechanisms from a cognitive-functional perspective, thereby shedding light on several open research questions. The exploration encourages interdisciplinary efforts to derive insights from human attention mechanisms in the pursuit of developing more generalized artificial intelligence.
