Table of Contents
Fetching ...

Building Knowledge from Interactions: An LLM-Based Architecture for Adaptive Tutoring and Social Reasoning

Luca Garello, Giulia Belgiovine, Gabriele Russo, Francesco Rea, Alessandra Sciutti

TL;DR

This work addresses the need for socially adept yet task-focused robotic tutors by combining an LLM-based interaction manager with a Knowledge Graph memory system in a multimodal HRI architecture. The authors implement an autonomous robot trainer that balances conversation and goal-directed guidance, and store interaction experiences as structured memories that feed a graph-based reasoning framework. They validate the approach through a real HRI user study and offline simulations with synthetic users, showing strong interaction planning and competitive or superior performance for multi-hop reasoning over traditional RAG baselines. The study demonstrates improved explainability, personalization potential, and scalability prospects for socially intelligent robotics in tutoring and education contexts, with insights into robustness and future enhancements such as adaptive interaction styles and dynamic memory management.

Abstract

Integrating robotics into everyday scenarios like tutoring or physical training requires robots capable of adaptive, socially engaging, and goal-oriented interactions. While Large Language Models show promise in human-like communication, their standalone use is hindered by memory constraints and contextual incoherence. This work presents a multimodal, cognitively inspired framework that enhances LLM-based autonomous decision-making in social and task-oriented Human-Robot Interaction. Specifically, we develop an LLM-based agent for a robot trainer, balancing social conversation with task guidance and goal-driven motivation. To further enhance autonomy and personalization, we introduce a memory system for selecting, storing and retrieving experiences, facilitating generalized reasoning based on knowledge built across different interactions. A preliminary HRI user study and offline experiments with a synthetic dataset validate our approach, demonstrating the system's ability to manage complex interactions, autonomously drive training tasks, and build and retrieve contextual memories, advancing socially intelligent robotics.

Building Knowledge from Interactions: An LLM-Based Architecture for Adaptive Tutoring and Social Reasoning

TL;DR

This work addresses the need for socially adept yet task-focused robotic tutors by combining an LLM-based interaction manager with a Knowledge Graph memory system in a multimodal HRI architecture. The authors implement an autonomous robot trainer that balances conversation and goal-directed guidance, and store interaction experiences as structured memories that feed a graph-based reasoning framework. They validate the approach through a real HRI user study and offline simulations with synthetic users, showing strong interaction planning and competitive or superior performance for multi-hop reasoning over traditional RAG baselines. The study demonstrates improved explainability, personalization potential, and scalability prospects for socially intelligent robotics in tutoring and education contexts, with insights into robustness and future enhancements such as adaptive interaction styles and dynamic memory management.

Abstract

Integrating robotics into everyday scenarios like tutoring or physical training requires robots capable of adaptive, socially engaging, and goal-oriented interactions. While Large Language Models show promise in human-like communication, their standalone use is hindered by memory constraints and contextual incoherence. This work presents a multimodal, cognitively inspired framework that enhances LLM-based autonomous decision-making in social and task-oriented Human-Robot Interaction. Specifically, we develop an LLM-based agent for a robot trainer, balancing social conversation with task guidance and goal-driven motivation. To further enhance autonomy and personalization, we introduce a memory system for selecting, storing and retrieving experiences, facilitating generalized reasoning based on knowledge built across different interactions. A preliminary HRI user study and offline experiments with a synthetic dataset validate our approach, demonstrating the system's ability to manage complex interactions, autonomously drive training tasks, and build and retrieve contextual memories, advancing socially intelligent robotics.

Paper Structure

This paper contains 29 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: HRI user study setup. The robot iCub showing a pose to a participant, providing real-time feedback.
  • Figure 2: LLM-Agent Prompt Template: during interaction, the robot collect information about the user's behavior (e.g., performance, prompts), updating the prompt template. The system determines whether to select a specific tool from the available list and updates the interaction stage and corresponding system prompt. Based on that, the robot executes specific verbal or motor behaviors.
  • Figure 3: Example of a section of the final graph, visualized through the Neo4j Aura app.
  • Figure 4: HRI Architecture: perception modules process visual and audio data, which is sent to the LLM-based Interaction Manager for reasoning and conversation flow. The Trainer module handles performance assessment and feedback. Action modules control motor and verbal outputs, while Memory Handlers manage data retrieval and interface with the Knowledge Graph.
  • Figure 5: Comparison of different retrieval methods for answering user questions. For specific questions about individual users, the performance of all methods is comparable. However, for complex, multi-hop questions requiring extended reasoning across the entire dataset — such as 'Who is the best practitioner in the database?' — the Cypher based approach outperforms other methods, demonstrating superior ability to aggregate and infer information across multiple data points."