Attention to Entropic Communication
Torsten Enßlin, Carolin Weidinger, Philipp Frank
TL;DR
This work integrates attention with information theory by introducing relative attention entropy (RAE), a proper, bandwidth-aware generalization of relative entropy that employs attention weights to emphasize informative regions of the signal. It first establishes the uniqueness of relative entropy under analyticity, locality, properness, and calibration, then shows that naive weighting undermines properness; it then constructs attention as a renormalized, positive density $\mathcal{A}^{(w)}(s|I)$ and derives $\mathcal{D}_{s}^{(w)}(I_{A},I_{B})$, ensuring proper communication when sender and receiver have only partial knowledge of the receiver’s utility. Through both analytical derivations and illustrative examples (including misaligned vs aligned interests), the paper demonstrates how attention-weighted messaging can optimally shape a receiver’s actions, and how RAEs compare to other scoring rules, particularly in emphasizing high-attention regions. The results illuminate not only technical applications in Bayesian updating and compression but also socio-psychological implications for real-world communication and cooperation under misaligned incentives. Overall, the framework provides a principled route to design attentional, utility-aware communication protocols with provable properties and broad relevance.
Abstract
The concept of attention, numerical weights that emphasize the importance of particular data, has proven to be very relevant in artificial intelligence. Relative entropy (RE, aka Kullback-Leibler divergence) plays a central role in communication theory. Here we combine these concepts, attention and RE. RE guides optimal encoding of messages in bandwidth-limited communication as well as optimal message decoding via the maximum entropy principle (MEP). In the coding scenario, RE can be derived from four requirements, namely being analytical, local, proper, and calibrated. Weighted RE, used for attention steering in communications, turns out to be improper. To see how proper attention communication can emerge, we analyze a scenario of a message sender who wants to ensure that the receiver of the message can perform well-informed actions. If the receiver decodes the message using the MEP, the sender only needs to know the receiver's utility function to inform optimally, but not the receiver's initial knowledge state. In case only the curvature of the utility function maxima are known, it becomes desirable to accurately communicate an attention function, in this case a by this curvature weighted and re-normalized probability function. Entropic attention communication is here proposed as the desired generalization of entropic communication that permits weighting while being proper, thereby aiding the design of optimal communication protocols in technical applications and helping to understand human communication. For example, our analysis shows how to derive the level of cooperation expected under misaligned interests of otherwise honest communication partners.
