Table of Contents
Fetching ...

CASPER: Cognitive Architecture for Social Perception and Engagement in Robots

Samuele Vinanzi, Angelo Cangelosi

TL;DR

CASPER tackles the problem of autonomous robots reading human intentions and engaging in collaborative tasks in daily settings. It proposes a symbolic-cognitive architecture that fuses perception of Qualitative Spatial Relations with bottom-up low-level action recognition and top-down high-level goal reasoning, all verified in real time by a knowledge base. The work introduces a novel integration of QSR descriptors for intention reading, supported by a Plan Library, a perception module, a probabilistic goal reader, and ontology-based verification, demonstrated in a kitchen-based simulation with promising accuracy and executable collaboration plans. The findings suggest CASPER’s approach yields interpretable, robust intention-reading and cooperative behavior, supporting future development of trust-aware, heterogeneous multi-agent teams in real-world environments.

Abstract

Our world is being increasingly pervaded by intelligent robots with varying degrees of autonomy. To seamlessly integrate themselves in our society, these machines should possess the ability to navigate the complexities of our daily routines even in the absence of a human's direct input. In other words, we want these robots to understand the intentions of their partners with the purpose of predicting the best way to help them. In this paper, we present CASPER (Cognitive Architecture for Social Perception and Engagement in Robots): a symbolic cognitive architecture that uses qualitative spatial reasoning to anticipate the pursued goal of another agent and to calculate the best collaborative behavior. This is performed through an ensemble of parallel processes that model a low-level action recognition and a high-level goal understanding, both of which are formally verified. We have tested this architecture in a simulated kitchen environment and the results we have collected show that the robot is able to both recognize an ongoing goal and to properly collaborate towards its achievement. This demonstrates a new use of Qualitative Spatial Relations applied to the problem of intention reading in the domain of human-robot interaction.

CASPER: Cognitive Architecture for Social Perception and Engagement in Robots

TL;DR

CASPER tackles the problem of autonomous robots reading human intentions and engaging in collaborative tasks in daily settings. It proposes a symbolic-cognitive architecture that fuses perception of Qualitative Spatial Relations with bottom-up low-level action recognition and top-down high-level goal reasoning, all verified in real time by a knowledge base. The work introduces a novel integration of QSR descriptors for intention reading, supported by a Plan Library, a perception module, a probabilistic goal reader, and ontology-based verification, demonstrated in a kitchen-based simulation with promising accuracy and executable collaboration plans. The findings suggest CASPER’s approach yields interpretable, robust intention-reading and cooperative behavior, supporting future development of trust-aware, heterogeneous multi-agent teams in real-world environments.

Abstract

Our world is being increasingly pervaded by intelligent robots with varying degrees of autonomy. To seamlessly integrate themselves in our society, these machines should possess the ability to navigate the complexities of our daily routines even in the absence of a human's direct input. In other words, we want these robots to understand the intentions of their partners with the purpose of predicting the best way to help them. In this paper, we present CASPER (Cognitive Architecture for Social Perception and Engagement in Robots): a symbolic cognitive architecture that uses qualitative spatial reasoning to anticipate the pursued goal of another agent and to calculate the best collaborative behavior. This is performed through an ensemble of parallel processes that model a low-level action recognition and a high-level goal understanding, both of which are formally verified. We have tested this architecture in a simulated kitchen environment and the results we have collected show that the robot is able to both recognize an ongoing goal and to properly collaborate towards its achievement. This demonstrates a new use of Qualitative Spatial Relations applied to the problem of intention reading in the domain of human-robot interaction.
Paper Structure (30 sections, 6 equations, 13 figures, 5 tables, 1 algorithm)

This paper contains 30 sections, 6 equations, 13 figures, 5 tables, 1 algorithm.

Figures (13)

  • Figure 1: The simulated robot equipped with CASPER observes the actions of another agent (in this case, a simulated human) in order to predict their goal and the best collaborative plan.
  • Figure 2: Overview of the proposed system. CASPER is composed of several parallel processes that interact with each other in a joint effort to decode an agent's intention and to formulate an appropriate response. The Perception module transforms visual observations into QSRs. The latter are used by the Low-Level process to predict the actions that are being performed in the environment and passes this information to the High-Level component, which tries to match them against the plan library to infer the pursued goal. A knowledge base, enveloped in the Verification module, ensures the step-by-step soundness of these predictions. Finally, the Supervisor coordinates all the other processes, collects the results and composes a collaborative plan that will be executed by the robot.
  • Figure 3: A plan in CASPER. Each goal is formed by a temporally ordered set of sub-elements with varying levels of abstraction. This structure is used for both intention reading and collaborative behavior generation.
  • Figure 4: The Decision Tree maps a set of QSRs into a movement.
  • Figure 5: A visual demonstration of the Plan Library's scoring system. These trees represent two plans for two distinct goals with a single observation $\hat{\sigma_1} = A$. The non-root nodes are drawn differently based on their status: filled if observed, dashed if unobserved and textured if missed. In this example, $P({G1} | \hat{\sigma}) = 0.59$ and $P({G2} | \hat{\sigma}) = 0.39$, so $G1$ is considered the best explanation.
  • ...and 8 more figures