Table of Contents
Fetching ...

muPRL: A Mutation Testing Pipeline for Deep Reinforcement Learning based on Real Faults

Deepak-George Thomas, Matteo Biagiola, Nargiz Humbatova, Mohammad Wardat, Gunel Jahangirova, Hridesh Rajan, Paolo Tonella

TL;DR

A taxonomy of real RL faults obtained by repository mining is described and the mutation operators derived from such real faults and implemented in the tool muPRL are presented, showing that muPRL is effective at discriminating strong from weak test generators, hence providing useful feedback to developers about the adequacy of the generated test scenarios.

Abstract

Reinforcement Learning (RL) is increasingly adopted to train agents that can deal with complex sequential tasks, such as driving an autonomous vehicle or controlling a humanoid robot. Correspondingly, novel approaches are needed to ensure that RL agents have been tested adequately before going to production. Among them, mutation testing is quite promising, especially under the assumption that the injected faults (mutations) mimic the real ones. In this paper, we first describe a taxonomy of real RL faults obtained by repository mining. Then, we present the mutation operators derived from such real faults and implemented in the tool muPRL. Finally, we discuss the experimental results, showing that muPRL is effective at discriminating strong from weak test generators, hence providing useful feedback to developers about the adequacy of the generated test scenarios.

muPRL: A Mutation Testing Pipeline for Deep Reinforcement Learning based on Real Faults

TL;DR

A taxonomy of real RL faults obtained by repository mining is described and the mutation operators derived from such real faults and implemented in the tool muPRL are presented, showing that muPRL is effective at discriminating strong from weak test generators, hence providing useful feedback to developers about the adequacy of the generated test scenarios.

Abstract

Reinforcement Learning (RL) is increasingly adopted to train agents that can deal with complex sequential tasks, such as driving an autonomous vehicle or controlling a humanoid robot. Correspondingly, novel approaches are needed to ensure that RL agents have been tested adequately before going to production. Among them, mutation testing is quite promising, especially under the assumption that the injected faults (mutations) mimic the real ones. In this paper, we first describe a taxonomy of real RL faults obtained by repository mining. Then, we present the mutation operators derived from such real faults and implemented in the tool muPRL. Finally, we discuss the experimental results, showing that muPRL is effective at discriminating strong from weak test generators, hence providing useful feedback to developers about the adequacy of the generated test scenarios.
Paper Structure (26 sections, 3 equations, 1 figure, 3 tables)

This paper contains 26 sections, 3 equations, 1 figure, 3 tables.

Figures (1)

  • Figure 1: Taxonomy of real RL faults: green indicates new fault types; orange and blue indicate fault types in common with the previous taxonomy nikanjam2022faults; blue indicates the ones that we renamed. SE/GH are preceded by the number of instances found in StackExchange/Github.