Table of Contents
Fetching ...

SEMANTIC SEE-THROUGH GOGGLES: Wearing Linguistic Virtual Reality in (Artificial) Intelligence

Goki Muramoto, Yuri Yasui, Hirosuke Asahi

TL;DR

The paper addresses the problem of language-mediated perception by introducing Semantic See-through Goggles, a prototype that verbalizes real-time visual input and then reimagines it as an image, all mediated by AI. It combines image captioning and text-to-image generation to produce a first-person linguistic VR experience, enabling subjective study of AI biases and semantic mediation. Through a preliminary quantitative study and a qualitative workshop, the work demonstrates partial preservation of meaning and notable biases in both linguistic and visual semantics, highlighting how language shapes perception and memory. The framework offers a tangible method to examine AI cognition, bias, and the broader philosophical question that intelligence can only see the world under meaning, with implications for AI ethics, VR design, and human–AI interaction.

Abstract

When language is utilized as a medium to store and communicate sensory information, there arises a kind of radical virtual reality, namely "the realities that are reduced into the same sentence are virtual/equivalent." In the current era, in which artificial intelligence engages in the linguistic mediation of sensory information, it is imperative to re-examine the various issues pertaining to this potential VR, particularly in relation to bias and (dis)communication. Semantic See-through Goggles represent an experimental framework for glasses through which the view is fully verbalized and re-depicted into the wearer's view. The participants wear the goggles equipped with a camera and head-mounted display (HMD). In real-time, the image captured by the camera is converted by the AI into a single line of text, which is then transformed into an image and presented to the user's eyes. This process enables users to perceive and interact with the real physical world through this redrawn view. We constructed a prototype of these goggles, examined their fundamental characteristics, and then conducted a qualitative analysis of the wearer's experience. This project investigates a methodology for subjectively capturing the situation in which AI serves as a proxy for our perception of the world. At the same time, It also attempts to appropriate some of the energy of today's debate over artificial intelligence for a classical inquiry around the fact that "intelligence can only see the world under meaning."

SEMANTIC SEE-THROUGH GOGGLES: Wearing Linguistic Virtual Reality in (Artificial) Intelligence

TL;DR

The paper addresses the problem of language-mediated perception by introducing Semantic See-through Goggles, a prototype that verbalizes real-time visual input and then reimagines it as an image, all mediated by AI. It combines image captioning and text-to-image generation to produce a first-person linguistic VR experience, enabling subjective study of AI biases and semantic mediation. Through a preliminary quantitative study and a qualitative workshop, the work demonstrates partial preservation of meaning and notable biases in both linguistic and visual semantics, highlighting how language shapes perception and memory. The framework offers a tangible method to examine AI cognition, bias, and the broader philosophical question that intelligence can only see the world under meaning, with implications for AI ethics, VR design, and human–AI interaction.

Abstract

When language is utilized as a medium to store and communicate sensory information, there arises a kind of radical virtual reality, namely "the realities that are reduced into the same sentence are virtual/equivalent." In the current era, in which artificial intelligence engages in the linguistic mediation of sensory information, it is imperative to re-examine the various issues pertaining to this potential VR, particularly in relation to bias and (dis)communication. Semantic See-through Goggles represent an experimental framework for glasses through which the view is fully verbalized and re-depicted into the wearer's view. The participants wear the goggles equipped with a camera and head-mounted display (HMD). In real-time, the image captured by the camera is converted by the AI into a single line of text, which is then transformed into an image and presented to the user's eyes. This process enables users to perceive and interact with the real physical world through this redrawn view. We constructed a prototype of these goggles, examined their fundamental characteristics, and then conducted a qualitative analysis of the wearer's experience. This project investigates a methodology for subjectively capturing the situation in which AI serves as a proxy for our perception of the world. At the same time, It also attempts to appropriate some of the energy of today's debate over artificial intelligence for a classical inquiry around the fact that "intelligence can only see the world under meaning."

Paper Structure

This paper contains 41 sections, 6 figures, 3 tables.

Figures (6)

  • Figure 1: See-through HMD is a goggle to see what is in front of you. Top-Left: an optical see-through HMD made by Huges Electronics. Top-right: a video see-through HMD by Courtesy Jannick Rolland, Frank Biocca, and UNC Chapel Hill Dept of Computer Science (Photo by Alex Tremi). Bottom-left: Inverted glasses experiments. Bottom-right: Lived Montage by Goki Muramoto (Photo: Kai Fukubayashi).
  • Figure 2: The sketch on the implementation of Semantic see-through Goggles. An HMD with a camera is provided, and on the way the landscape image arrives from the camera to the HMD, it is converted once into a single line of text and re-imaged, by the two AIs.
  • Figure 3: Comparison of linguistic similarities using different similarity indices.The top left figure shows the distribution for each condition in TF-IDF, the top right shows WMD, the bottom left shows USE, and the bottom right shows SBERT. The vertical axis is not aligned in order to see the distribution. Outliers are defined as values in the 99th percentile.
  • Figure 4: Comparison of visual similarities using different similarity indices. The top left figure shows the distribution for each condition in HI similarity, the top right shows SIFT similarity, the bottom left shows LPIPSA, and the bottom right shows LPIPST. The vertical axis is not aligned in order to see the distribution. Outliers are defined as values in the 99th percentile.
  • Figure 5: The workshop view. Participants wear Semantic See-through Goggles and observe, walk, and interact with the environment involving various objects and people. Participants who are not wearing the goggles can see what the wearer is looking at and what text is being mediated by a display embedded in the front side of the goggles.
  • ...and 1 more figures