Table of Contents
Fetching ...

Semantic Glitch: Agency and Artistry in an Autonomous Pixel Cloud

Qing Zhang, Jing Huang, Mingyang Xu, Jun Rekimoto

TL;DR

This work investigates a lo-fi autonomous agent—the Pixel Cloud—that abandons traditional metric sensing in favor of a stateful semantic navigator powered by a multimodal language system. A two-stage prompting pipeline creates a persistent mental map and a local, personality-driven control loop, enabling goal-directed navigation and emergent social behaviors despite a physically uncertain body. The study demonstrates distinct, characterful personas across multiple environments and validates that authored prompts yield statistically different behavioral fingerprints. By reframing robotic agency as a narrative, empathetic experience rather than a strictly efficient system, the paper highlights a novel design space for relatable autonomous agents in HRI and speculative art contexts.

Abstract

While mainstream robotics pursues metric precision and flawless performance, this paper explores the creative potential of a deliberately "lo-fi" approach. We present the "Semantic Glitch," a soft flying robotic art installation whose physical form, a 3D pixel style cloud, is a "physical glitch" derived from digital archaeology. We detail a novel autonomous pipeline that rejects conventional sensors like LiDAR and SLAM, relying solely on the qualitative, semantic understanding of a Multimodal Large Language Model to navigate. By authoring a bio-inspired personality for the robot through a natural language prompt, we create a "narrative mind" that complements the "weak," historically, loaded body. Our analysis begins with a 13-minute autonomous flight log, and a follow-up study statistically validates the framework's robustness for authoring quantifiably distinct personas. The combined analysis reveals emergent behaviors, from landmark-based navigation to a compelling "plan to execution" gap, and a character whose unpredictable, plausible behavior stems from a lack of precise proprioception. This demonstrates a lo-fi framework for creating imperfect companions whose success is measured in character over efficiency.

Semantic Glitch: Agency and Artistry in an Autonomous Pixel Cloud

TL;DR

This work investigates a lo-fi autonomous agent—the Pixel Cloud—that abandons traditional metric sensing in favor of a stateful semantic navigator powered by a multimodal language system. A two-stage prompting pipeline creates a persistent mental map and a local, personality-driven control loop, enabling goal-directed navigation and emergent social behaviors despite a physically uncertain body. The study demonstrates distinct, characterful personas across multiple environments and validates that authored prompts yield statistically different behavioral fingerprints. By reframing robotic agency as a narrative, empathetic experience rather than a strictly efficient system, the paper highlights a novel design space for relatable autonomous agents in HRI and speculative art contexts.

Abstract

While mainstream robotics pursues metric precision and flawless performance, this paper explores the creative potential of a deliberately "lo-fi" approach. We present the "Semantic Glitch," a soft flying robotic art installation whose physical form, a 3D pixel style cloud, is a "physical glitch" derived from digital archaeology. We detail a novel autonomous pipeline that rejects conventional sensors like LiDAR and SLAM, relying solely on the qualitative, semantic understanding of a Multimodal Large Language Model to navigate. By authoring a bio-inspired personality for the robot through a natural language prompt, we create a "narrative mind" that complements the "weak," historically, loaded body. Our analysis begins with a 13-minute autonomous flight log, and a follow-up study statistically validates the framework's robustness for authoring quantifiably distinct personas. The combined analysis reveals emergent behaviors, from landmark-based navigation to a compelling "plan to execution" gap, and a character whose unpredictable, plausible behavior stems from a lack of precise proprioception. This demonstrates a lo-fi framework for creating imperfect companions whose success is measured in character over efficiency.

Paper Structure

This paper contains 7 sections, 4 figures.

Figures (4)

  • Figure 1: The "Semantic Glitch" hardware, flight behavior, and first-person perspective with MLLM-generated reasoning. (A, B) The robot's body, showing the placement of the ESP32 with a fish-eye camera, electronic speed controllers, Li-Po battery, and propeller modules. (C, D, E) The robot in flight, demonstrating its interaction with the environment and its "perspective-dependent morphological illusion."
  • Figure 2: The two-phase semantic reasoning pipeline. Phase 1 (Initialization): A single 360$^{\circ}$ panorama and a PREAMBLE_PROMPT are sent to the Gemini API to establish a stateful "mental map". Phase 2 (Control Loop): In a continuous loop, the system uses the live camera view and a DIRECTIONAL_PROMPT to generate context-aware actions, which are then logged and executed by the robot.
  • Figure 3: Key moments from the agent's first-person perspective, linking its visual input to its logged decisions. (F) Lateral avoidance of a person. (G) Vertical avoidance of a person. (H) A corrective turn near the staircase, illustrating the plan-to-execution gap. (I) Goal-oriented navigation towards distant lights. (J) Seeking open space. (K) A moment of contemplative inaction.
  • Figure 4: Social Stance Analysis: This chart quantifies each persona's reaction to human presence. The stark contrast—with the Companion overwhelmingly choosing Approach actions and the others choosing Avoidance—provides direct quantitative evidence of their authored social dispositions.