Table of Contents
Fetching ...

The Human Visual System Can Inspire New Interaction Paradigms for LLMs

Diana Robinson, Neil Lawrence

TL;DR

This position paper argues that interpreting LLMs through a human visual system lens can address trust and hallucination issues by leveraging shared representations, external visual memory, grounding, exploration, and affordances. It articulates concrete research directions, such as saccade-map visualizations of concept space and human-in-the-loop abstraction editing, anchored by case studies like the Equational Theories project. By drawing on sensorimotor vision and active inference, the authors propose HAMs and information-grounding frameworks to improve auditability and collaboration between humans and LLMs. The work aims to catalyze new interaction paradigms and evaluation methods that enhance interpretability, safety, and user agency in AI-assisted reasoning.

Abstract

The dominant metaphor of LLMs-as-minds leads to misleading conceptions of machine agency and is limited in its ability to help both users and developers build the right degree of trust and understanding for outputs from LLMs. It makes it harder to disentangle hallucinations from useful model interactions. This position paper argues that there are fundamental similarities between visual perception and the way LLMs process and present language. These similarities inspire a metaphor for LLMs which could open new avenues for research into interaction paradigms and shared representations. Our visual system metaphor introduces possibilities for addressing these challenges by understanding the information landscape assimilated by LLMs. In this paper we motivate our proposal, introduce the interrelating theories from the fields that inspired this view and discuss research directions that stem from this abstraction.

The Human Visual System Can Inspire New Interaction Paradigms for LLMs

TL;DR

This position paper argues that interpreting LLMs through a human visual system lens can address trust and hallucination issues by leveraging shared representations, external visual memory, grounding, exploration, and affordances. It articulates concrete research directions, such as saccade-map visualizations of concept space and human-in-the-loop abstraction editing, anchored by case studies like the Equational Theories project. By drawing on sensorimotor vision and active inference, the authors propose HAMs and information-grounding frameworks to improve auditability and collaboration between humans and LLMs. The work aims to catalyze new interaction paradigms and evaluation methods that enhance interpretability, safety, and user agency in AI-assisted reasoning.

Abstract

The dominant metaphor of LLMs-as-minds leads to misleading conceptions of machine agency and is limited in its ability to help both users and developers build the right degree of trust and understanding for outputs from LLMs. It makes it harder to disentangle hallucinations from useful model interactions. This position paper argues that there are fundamental similarities between visual perception and the way LLMs process and present language. These similarities inspire a metaphor for LLMs which could open new avenues for research into interaction paradigms and shared representations. Our visual system metaphor introduces possibilities for addressing these challenges by understanding the information landscape assimilated by LLMs. In this paper we motivate our proposal, introduce the interrelating theories from the fields that inspired this view and discuss research directions that stem from this abstraction.

Paper Structure

This paper contains 21 sections, 1 figure, 1 table.

Figures (1)

  • Figure 1: The Penrose Triangle. While the image is locally consistent, it is globally inconsistent. In our metaphor a similar inconsistency could occur in an LLM where a conversation is consistent in individual exchanges, but inconsistent in its broader structure. Such a conversation would be poorly grounded.