Transparency as Delayed Observability in Multi-Agent Systems

Kshama Dwarakanath; Svitlana Vyetrenko; Toks Oyebode; Tucker Balch

Transparency as Delayed Observability in Multi-Agent Systems

Kshama Dwarakanath, Svitlana Vyetrenko, Toks Oyebode, Tucker Balch

TL;DR

This work formalizes transparency in multi-agent systems as delayed observability of environment states, parameterized by a delay $\delta$, and studies its impact on agent strategies and social welfare via a learning-based framework. It introduces a POSG/MARL approach with two agent archetypes—constrained and unconstrained—trained in a simulated financial market using PPO within the ABIDES environment, and defines social welfare as a product of equality and profitability using $SWF(Y)=\exp\left(-GE_{\kappa}(Y)\right)\times\Bar{Y}$ or $SWF(Y)=\exp\left(-Theil_{L}(Y)\right)\times\Bar{Y}$. The empirical results show opposing effects of delay on the two agent types: constrained agents benefit from higher delay (lower observability) while unconstrained agents benefit from lower delay, and overall social welfare peaks at an intermediate level of transparency ($\delta\approx300$). These findings suggest that partial transparency regimes can maximize welfare in complex MAS settings and have practical implications for policy design in markets and other dynamic systems, where information release must balance individual incentives with collective outcomes.

Abstract

Is transparency always beneficial in complex systems such as traffic networks and stock markets? How is transparency defined in multi-agent systems, and what is its optimal degree at which social welfare is highest? We take an agent-based view to define transparency (or its lacking) as delay in agent observability of environment states, and utilize simulations to analyze the impact of delay on social welfare. To model the adaptation of agent strategies with varying delays, we model agents as learners maximizing the same objectives under different delays in a simulated environment. Focusing on two agent types - constrained and unconstrained, we use multi-agent reinforcement learning to evaluate the impact of delay on agent outcomes and social welfare. Empirical demonstration of our framework in simulated financial markets shows opposing trends in outcomes of the constrained and unconstrained agents with delay, with an optimal partial transparency regime at which social welfare is maximal.

Transparency as Delayed Observability in Multi-Agent Systems

TL;DR

This work formalizes transparency in multi-agent systems as delayed observability of environment states, parameterized by a delay

, and studies its impact on agent strategies and social welfare via a learning-based framework. It introduces a POSG/MARL approach with two agent archetypes—constrained and unconstrained—trained in a simulated financial market using PPO within the ABIDES environment, and defines social welfare as a product of equality and profitability using

. The empirical results show opposing effects of delay on the two agent types: constrained agents benefit from higher delay (lower observability) while unconstrained agents benefit from lower delay, and overall social welfare peaks at an intermediate level of transparency (

). These findings suggest that partial transparency regimes can maximize welfare in complex MAS settings and have practical implications for policy design in markets and other dynamic systems, where information release must balance individual incentives with collective outcomes.

Abstract

Paper Structure (17 sections, 4 equations, 7 figures, 1 table)

This paper contains 17 sections, 4 equations, 7 figures, 1 table.

INTRODUCTION
BACKGROUND AND RELATED WORK
Transparency
Partially Observable Stochastic Games and Multi-Agent Reinforcement Learning
PROBLEM FORMULATION
Delayed Observability and Constrained Agents
Social Welfare Function
APPLICATION TO MARKETS
Multi-Agent Market Simulator
POSG Formulation for Markets
EXPERIMENTAL RESULTS
Training
Impact of Delay on Player Outcomes
Impact of Delay on Learnt Strategies
Impact of Delay on Social Welfare
...and 2 more sections

Figures (7)

Figure 1: Snapshot of buy and sell orders at an exchange. Mid-price moves when traders submit orders that cross the spread i.e., a buy order with price greater than or equal to the best sell price or vice-versa.
Figure 2: Discounted cumulative rewards while training demonstrating convergence in learning.
Figure 3: Cumulative rewards measuring player outcomes. With increase in observability (decrease in delay), constrained player outcomes worsen while unconstrained player outcomes improve.
Figure 4: Learnt strategies of MM and PT: Half-spread of orders. See the (near) monotonic trend in strategies with delay.
Figure 5: Learnt strategy of PT: % of hold decisions. PT trades more frequently (holds less) at low delays.
...and 2 more figures

Transparency as Delayed Observability in Multi-Agent Systems

TL;DR

Abstract

Transparency as Delayed Observability in Multi-Agent Systems

Authors

TL;DR

Abstract

Table of Contents

Figures (7)