Integrating Cognitive Processing Signals into Language Models: A Review of Advances, Applications and Future Directions
Angela Lopez-Cardona, Sebastian Idesis, Ioannis Arapakis
TL;DR
This review addresses data bottlenecks, alignment, and hallucination challenges in language and multimodal language models by surveying the integration of eye-tracking and related cognitive signals. It surveys data acquisition (ET vs neuroimaging) and a spectrum of augmentation strategies, including input-level, representation-level, encoder-based, multi-task, and architectural approaches. Across language understanding, language modeling, QA, and other NLP tasks, ET signals show potential to improve data efficiency, convergence speed, and human-aligned behavior, with synthetic gaze data enabling inference when gaze is unavailable. The work highlights practical implications for data-rich, environmentally conscious training and calls for rigorous, privacy-preserving, cross-disciplinary research to realize ET-enabled improvements in real-world systems.
Abstract
Recently, the integration of cognitive neuroscience in Natural Language Processing (NLP) has gained significant attention. This article provides a critical and timely overview of recent advancements in leveraging cognitive signals, particularly Eye-tracking (ET) signals, to enhance Language Models (LMs) and Multimodal Large Language Models (MLLMs). By incorporating user-centric cognitive signals, these approaches address key challenges, including data scarcity and the environmental costs of training large-scale models. Cognitive signals enable efficient data augmentation, faster convergence, and improved human alignment. The review emphasises the potential of ET data in tasks like Visual Question Answering (VQA) and mitigating hallucinations in MLLMs, and concludes by discussing emerging challenges and research trends.
