muPRL: A Mutation Testing Pipeline for Deep Reinforcement Learning based on Real Faults

Deepak-George Thomas; Matteo Biagiola; Nargiz Humbatova; Mohammad Wardat; Gunel Jahangirova; Hridesh Rajan; Paolo Tonella

muPRL: A Mutation Testing Pipeline for Deep Reinforcement Learning based on Real Faults

Deepak-George Thomas, Matteo Biagiola, Nargiz Humbatova, Mohammad Wardat, Gunel Jahangirova, Hridesh Rajan, Paolo Tonella

TL;DR

A taxonomy of real RL faults obtained by repository mining is described and the mutation operators derived from such real faults and implemented in the tool muPRL are presented, showing that muPRL is effective at discriminating strong from weak test generators, hence providing useful feedback to developers about the adequacy of the generated test scenarios.

Abstract

Reinforcement Learning (RL) is increasingly adopted to train agents that can deal with complex sequential tasks, such as driving an autonomous vehicle or controlling a humanoid robot. Correspondingly, novel approaches are needed to ensure that RL agents have been tested adequately before going to production. Among them, mutation testing is quite promising, especially under the assumption that the injected faults (mutations) mimic the real ones. In this paper, we first describe a taxonomy of real RL faults obtained by repository mining. Then, we present the mutation operators derived from such real faults and implemented in the tool muPRL. Finally, we discuss the experimental results, showing that muPRL is effective at discriminating strong from weak test generators, hence providing useful feedback to developers about the adequacy of the generated test scenarios.

muPRL: A Mutation Testing Pipeline for Deep Reinforcement Learning based on Real Faults

TL;DR

Abstract

Paper Structure (26 sections, 3 equations, 1 figure, 3 tables)

This paper contains 26 sections, 3 equations, 1 figure, 3 tables.

Introduction
Related Work
Fault Classification
Mutation Testing for AI-based systems
Taxonomy of real RL faults
Mining of Software Artefacts
Mining GitHub
Mining Stack Exchange
Manual Labelling
Taxonomy Construction
The Final Taxonomy
Comparing Prior Work with our Taxonomy
Mutation Analysis
Mutation Operators
Mutation Analysis Procedure
...and 11 more sections

Figures (1)

Figure 1: Taxonomy of real RL faults: green indicates new fault types; orange and blue indicate fault types in common with the previous taxonomy nikanjam2022faults; blue indicates the ones that we renamed. SE/GH are preceded by the number of instances found in StackExchange/Github.

muPRL: A Mutation Testing Pipeline for Deep Reinforcement Learning based on Real Faults

TL;DR

Abstract

muPRL: A Mutation Testing Pipeline for Deep Reinforcement Learning based on Real Faults

Authors

TL;DR

Abstract

Table of Contents

Figures (1)