Table of Contents
Fetching ...

A cooperative strategy for diagnosing the root causes of quality requirement violations in multiagent systems

João Faccin, Ingrid Nunes, Abdelwahab Hamou-Lhadj

TL;DR

The paper addresses the challenge of maintaining quality in multiagent systems under abnormal conditions by proposing a decentralized, cooperative strategy for root-cause diagnosis and remediation. It combines an interaction protocol with role-specific algorithms that allow agents to determine whether quality violations originate internally, from service providers, or from the communication layer, and to apply self-healing or mitigation accordingly. An empirical evaluation in a service-oriented MAS demonstrates that the cooperative approach can correctly identify failure sources and normalise operation, achieving competitive execution time and substantially lower cost than remedial baselines. Overall, this work advances autonomous resilience in MAS by eliminating centralized diagnosis and enabling runtime adaptation through agent collaboration.

Abstract

Many modern software systems are built as a set of autonomous software components (also called agents) that collaborate with each other and are situated in an environment. To keep these multiagent systems operational under abnormal circumstances, it is crucial to make them resilient. Existing solutions are often centralised and rely on information manually provided by experts at design time, making such solutions rigid and limiting the autonomy and adaptability of the system. In this work, we propose a cooperative strategy focused on the identification of the root causes of quality requirement violations in multiagent systems. This strategy allows agents to cooperate with each other in order to identify whether these violations come from service providers, associated components, or the communication infrastructure. From this identification process, agents are able to adapt their behaviour in order to mitigate and solve existing abnormalities with the aim of normalising system operation. This strategy consists of an interaction protocol that, together with the proposed algorithms, allow agents playing the protocol roles to diagnose problems to be repaired. We evaluate our proposal with the implementation of a service-oriented system. The results demonstrate that our solution enables the correct identification of different sources of failures, favouring the selection of the most suitable actions to be taken to overcome abnormal situations.

A cooperative strategy for diagnosing the root causes of quality requirement violations in multiagent systems

TL;DR

The paper addresses the challenge of maintaining quality in multiagent systems under abnormal conditions by proposing a decentralized, cooperative strategy for root-cause diagnosis and remediation. It combines an interaction protocol with role-specific algorithms that allow agents to determine whether quality violations originate internally, from service providers, or from the communication layer, and to apply self-healing or mitigation accordingly. An empirical evaluation in a service-oriented MAS demonstrates that the cooperative approach can correctly identify failure sources and normalise operation, achieving competitive execution time and substantially lower cost than remedial baselines. Overall, this work advances autonomous resilience in MAS by eliminating centralized diagnosis and enabling runtime adaptation through agent collaboration.

Abstract

Many modern software systems are built as a set of autonomous software components (also called agents) that collaborate with each other and are situated in an environment. To keep these multiagent systems operational under abnormal circumstances, it is crucial to make them resilient. Existing solutions are often centralised and rely on information manually provided by experts at design time, making such solutions rigid and limiting the autonomy and adaptability of the system. In this work, we propose a cooperative strategy focused on the identification of the root causes of quality requirement violations in multiagent systems. This strategy allows agents to cooperate with each other in order to identify whether these violations come from service providers, associated components, or the communication infrastructure. From this identification process, agents are able to adapt their behaviour in order to mitigate and solve existing abnormalities with the aim of normalising system operation. This strategy consists of an interaction protocol that, together with the proposed algorithms, allow agents playing the protocol roles to diagnose problems to be repaired. We evaluate our proposal with the implementation of a service-oriented system. The results demonstrate that our solution enables the correct identification of different sources of failures, favouring the selection of the most suitable actions to be taken to overcome abnormal situations.
Paper Structure (16 sections, 9 equations, 6 figures, 2 tables, 3 algorithms)

This paper contains 16 sections, 9 equations, 6 figures, 2 tables, 3 algorithms.

Figures (6)

  • Figure 1: A MAS in which components interact with each other by consuming and providing services. (a) Agent $p_a$ relies on services $b$ and $c$ from agents $p_b$ and $p_c$, which, in turn, consume services from other agents. (b) Agent $p_a$ replaces service provider $p_b$ with ${p_b}'$.
  • Figure 2: An overview of system behaviour implementing our proposed solution. (a) Agent $p_b$ presents an abnormal behaviour that affects agents that depend on it. (b) Agent $c$ perceives a violation on a quality requirement and notifies agent $p_a$. (c) Agent $p_a$ replaces provider $p_b$ with ${p_b}'$ and informs $c$ that its operation is normalised. (d) $p_a$ broadcasts a request of information, which is replied by $n$ and $n'$. (e) $p_a$ notifies $p_b$ of its perceived abnormal behaviour. (f) $p_b$ informs $p_a$ that its operation is normalised. (g) $p_a$ replaces ${p_b}'$ with $p_b$ and the system returns to its former state.
  • Figure 3: An interaction protocol between components of a system. Clients request services with request-service messages, which are replied by providers with corresponding inform-service messages. Clients notify abnormal components of quality requirement violations with inform-abnormality messages, which are replied with inform-normality messages after abnormalities are handled. Providers are able to issue request-probability messages to cooperating components, which may reply with inform-probability messages.
  • Figure 4: The estimated probability distribution obtained after executing getFunction($L_{\mathcal{M}_q(response\_time)}, L_{time}$). Values laying outside the lower and upper boundaries are considered anomalous.
  • Figure 5: A service-oriented system comprising 37 autonomous components and an external client $c$. F1, F2 and F3 are failures introduced to the system to affect different components and communication links.
  • ...and 1 more figures

Theorems & Definitions (6)

  • Definition 1: Service
  • Definition 2: Quality Feature
  • Definition 3: Quality Requirement
  • Definition 4: Message
  • Definition 5: Interaction Trace
  • Definition 6: Agent