What if Pinocchio Were a Reinforcement Learning Agent: A Normative End-to-End Pipeline

Benoît Alcaraz

What if Pinocchio Were a Reinforcement Learning Agent: A Normative End-to-End Pipeline

Benoît Alcaraz

Abstract

In the past decade, artificial intelligence (AI) has developed quickly. With this rapid progression came the need for systems capable of complying with the rules and norms of our society so that they can be successfully and safely integrated into our daily lives. Inspired by the story of Pinocchio in ``Le avventure di Pinocchio - Storia di un burattino'', this thesis proposes a pipeline that addresses the problem of developing norm compliant and context-aware agents. Building on the AJAR, Jiminy, and NGRL architectures, the work introduces \pino, a hybrid model in which reinforcement learning agents are supervised by argumentation-based normative advisors. In order to make this pipeline operational, this thesis also presents a novel algorithm for automatically extracting the arguments and relationships that underlie the advisors' decisions. Finally, this thesis investigates the phenomenon of \textit{norm avoidance}, providing a definition and a mitigation strategy within the context of reinforcement learning agents. Each component of the pipeline is empirically evaluated. The thesis concludes with a discussion of related work, current limitations, and directions for future research.

What if Pinocchio Were a Reinforcement Learning Agent: A Normative End-to-End Pipeline

Abstract

Paper Structure (106 sections, 28 equations, 43 figures, 10 tables, 6 algorithms)

This paper contains 106 sections, 28 equations, 43 figures, 10 tables, 6 algorithms.

Introduction
Context and Motivations
Research Questions
Methodology
Evaluation
Contributions
Layout of this Thesis
Preliminaries
Technical Background
Reinforcement Learning
Norms
Formal Argumentation
Background
The AJAR Framework
The Jiminy Architecture
...and 91 more sections

Figures (43)

Figure 1: Example of an MDP.
Figure 2: Reinforcement Learning training loop.
Figure 3: Example of a Labelled MDP.
Figure 4: Representation as a directed graph of an argumentation framework.
Figure 6: Jiminy's smart home example. Reused from liao2019building.
...and 38 more figures

Theorems & Definitions (33)

Definition 1
Remark 1
Definition 2
Definition 3: Defeated Norm
Definition 4: Activated Norm
Definition 5: Compliance
Definition 6: Violation
Remark 2
Definition 7: Argumentation Framework
Remark 3
...and 23 more

What if Pinocchio Were a Reinforcement Learning Agent: A Normative End-to-End Pipeline

Abstract

What if Pinocchio Were a Reinforcement Learning Agent: A Normative End-to-End Pipeline

Authors

Abstract

Table of Contents

Figures (43)

Theorems & Definitions (33)