Table of Contents
Fetching ...

Trustworthiness in Stochastic Systems: Towards Opening the Black Box

Jennifer Chien, David Danks

TL;DR

This paper tackles the challenge of trust in stochastic AI systems, arguing that traditional deterministic trust frameworks fail when outputs vary probabilistically. It introduces a refined notion of stochasticity that centers on value-relevance, and proposes latent value modeling within a sociotechnical framework to assess and calibrate trustworthiness. The authors critique three common responses—eliminating stochasticity, exposing variability to users, and direct value elicitation or RLHF—and argue for a latent-value, open-box approach that decomposes values across the human and system components. The work aims to enable precise trust calibration in complex AI deployments by aligning user and system values through causal, sociotechnical analysis, while acknowledging practical challenges in value inference and governance.

Abstract

AI systems are increasingly tasked to complete responsibilities with decreasing oversight. This delegation requires users to accept certain risks, typically mitigated by perceived or actual alignment of values between humans and AI, leading to confidence that the system will act as intended. However, stochastic behavior by an AI system threatens to undermine alignment and potential trust. In this work, we take a philosophical perspective to the tension and potential conflict between stochasticity and trustworthiness. We demonstrate how stochasticity complicates traditional methods of establishing trust and evaluate two extant approaches to managing it: (1) eliminating user-facing stochasticity to create deterministic experiences, and (2) allowing users to independently control tolerances for stochasticity. We argue that both approaches are insufficient, as not all forms of stochasticity affect trustworthiness in the same way or to the same degree. Instead, we introduce a novel definition of stochasticity and propose latent value modeling for both AI systems and users to better assess alignment. This work lays a foundational step toward understanding how and when stochasticity impacts trustworthiness, enabling more precise trust calibration in complex AI systems, and underscoring the importance of sociotechnical analyses to effectively address these challenges.

Trustworthiness in Stochastic Systems: Towards Opening the Black Box

TL;DR

This paper tackles the challenge of trust in stochastic AI systems, arguing that traditional deterministic trust frameworks fail when outputs vary probabilistically. It introduces a refined notion of stochasticity that centers on value-relevance, and proposes latent value modeling within a sociotechnical framework to assess and calibrate trustworthiness. The authors critique three common responses—eliminating stochasticity, exposing variability to users, and direct value elicitation or RLHF—and argue for a latent-value, open-box approach that decomposes values across the human and system components. The work aims to enable precise trust calibration in complex AI deployments by aligning user and system values through causal, sociotechnical analysis, while acknowledging practical challenges in value inference and governance.

Abstract

AI systems are increasingly tasked to complete responsibilities with decreasing oversight. This delegation requires users to accept certain risks, typically mitigated by perceived or actual alignment of values between humans and AI, leading to confidence that the system will act as intended. However, stochastic behavior by an AI system threatens to undermine alignment and potential trust. In this work, we take a philosophical perspective to the tension and potential conflict between stochasticity and trustworthiness. We demonstrate how stochasticity complicates traditional methods of establishing trust and evaluate two extant approaches to managing it: (1) eliminating user-facing stochasticity to create deterministic experiences, and (2) allowing users to independently control tolerances for stochasticity. We argue that both approaches are insufficient, as not all forms of stochasticity affect trustworthiness in the same way or to the same degree. Instead, we introduce a novel definition of stochasticity and propose latent value modeling for both AI systems and users to better assess alignment. This work lays a foundational step toward understanding how and when stochasticity impacts trustworthiness, enabling more precise trust calibration in complex AI systems, and underscoring the importance of sociotechnical analyses to effectively address these challenges.

Paper Structure

This paper contains 24 sections, 2 figures.

Figures (2)

  • Figure 1: Causal diagrams from User and LLM Perspectives. Yellow nodes are observed elements, red are user-determined latent states and blue are LLM-determined latent states. We separately depict the user perspective (left) and the LLM perspective (right).
  • Figure 2: Causal diagram of User and LLM Perspectives Combined. Yellow nodes are observed elements, red are user-determined latent states and blue are LLM-determined latent states. We depict the user perspective and the LLM perspective overlapping in the central causal chain of the generative AI from prompt to output.