Table of Contents
Fetching ...

Design Space and Implementation of RAG-Based Avatars for Virtual Archaeology

Wilhelm Kerle-Malcharek, Giulio Biondi, Karsten Klein, Ulf Hailer, Steffen Diefenbach, Fabrizio Grosso, Marco Legittimo, Paola Venuti, Carla Binucci, Giuseppe Liotta, Falk Schreiber

Abstract

Immersive technologies, such as virtual and augmented reality, are transforming digital heritage by enabling users to explore and interact with culturally significant sites. It is now possible to view and augment digital twins, or digitally reconstructed versions of them, and to enable access to previously unreachable locations for a broader audience. Here, we investigate retrieval-augmented generation (RAG)-based avatars as an interface for accessing further information about digital cultural heritage objects while immersed in dedicated virtual environments. We present a requirement design space that spans the application realm, avatar personality, and I/O modalities. We instantiate it with a RAG system coupled to a conversational avatar in a virtual reality (VR) environment, using the Maxentius mausoleum from the 4th century AD as a case study, through which users gain access to curated on-demand information of the digitised heritage object. Our workflow utilises scholarly texts and enriches them with metadata. We evaluate various RAG configurations in terms of answer quality on a small expert-crafted question-answer set, as well as the perceived workload of users of a VR setup using such a RAG avatar. We demonstrate evidence that users perceive the overall workload for interacting with such an avatar as below average and that such avatars help to gain topical engagement. Overall, our work demonstrates how to utilise RAG-driven VR avatars for archaeological purposes and provides evidence that they can offer a pathway for immersive, AI-enhanced digital heritage applications.

Design Space and Implementation of RAG-Based Avatars for Virtual Archaeology

Abstract

Immersive technologies, such as virtual and augmented reality, are transforming digital heritage by enabling users to explore and interact with culturally significant sites. It is now possible to view and augment digital twins, or digitally reconstructed versions of them, and to enable access to previously unreachable locations for a broader audience. Here, we investigate retrieval-augmented generation (RAG)-based avatars as an interface for accessing further information about digital cultural heritage objects while immersed in dedicated virtual environments. We present a requirement design space that spans the application realm, avatar personality, and I/O modalities. We instantiate it with a RAG system coupled to a conversational avatar in a virtual reality (VR) environment, using the Maxentius mausoleum from the 4th century AD as a case study, through which users gain access to curated on-demand information of the digitised heritage object. Our workflow utilises scholarly texts and enriches them with metadata. We evaluate various RAG configurations in terms of answer quality on a small expert-crafted question-answer set, as well as the perceived workload of users of a VR setup using such a RAG avatar. We demonstrate evidence that users perceive the overall workload for interacting with such an avatar as below average and that such avatars help to gain topical engagement. Overall, our work demonstrates how to utilise RAG-driven VR avatars for archaeological purposes and provides evidence that they can offer a pathway for immersive, AI-enhanced digital heritage applications.
Paper Structure (22 sections, 6 figures, 5 tables)

This paper contains 22 sections, 6 figures, 5 tables.

Figures (6)

  • Figure 1: This illustration shows the three overarching steps taken to design and implement a digital AI-driven avatar for immersive spaces in an archaeological context. The conceptual design is a means of deciding what constitutes the requirements for the system, the logical design to decide on the overall architecture, and the physical design to do the actual implementation.
  • Figure 2: This illustration shows our proposal for a requirement space for avatars in virtual archaeology applications. It contains three main building blocks: (1) The application realm indicates who the final users of such a system are. (2) The avatar personality helps to figure out what type of answers a user could expect from a given avatar. (3) The avatar I/O thematises the need to decide on how input is accepted by an avatar and how it provides information, given the focus on immersive environments. The design spaces (system type, user category, $\dots$) refer to the following publications, in order from top to bottom: Doerr doerr2009ontologies, Deshpande deshpande2022responsible, our suggestion with the "User" based on Walsh walsh2016user, Bartsch bartsch2025epistemic, van Peer van2001new, Rashik rashik2024beyond, and Martin martin2022multimodality for I/O.
  • Figure 3: An illustration of the selection for our setup for the Maxentius mausoleum use case, for the elements refer to Figure \ref{['fig:requirementSpace']}. Our application realm is chosen to serve (semi-)expert individuals for research purposes. The avatar personality of the avatar is chosen to be personal-expert with an authorial narration style and an abstract-robotic embodiment. The output for the avatar is vision through simplified gestures and audio. The input is audio. The left-to-right-arrow is indicating that the choices made are now horizontally displayed, instead of the vertical selection order as proposed in the original picture.
  • Figure 4: Simplified general workflow of a RAG. In yellow are a couple of identified ways to include our criteria.
  • Figure 5: This image shows an impression of what users saw during the user study. The grey fields show the potential answers users could choose from, in this case, for the question "What is the dating of the mausoleum?"
  • ...and 1 more figures