Table of Contents
Fetching ...

Embracing Ambiguity: Bayesian Nonparametrics and Stakeholder Participation for Ambiguity-Aware Safety Evaluation

Yanan Long

TL;DR

The paper tackles the limitation of single-point risk metrics in evaluating generative models by introducing a multiplicity-aware safety framework. It formalizes decoding Rashomon sets and tail-focused risk functionals, and then proposes a dependent Dirichlet process (DDP) mixture with stakeholder-conditioned weights to model multi-modal harm surfaces across prompts and decoding knobs. An active sampling pipeline with Bayesian deep surrogates guides efficient knob-space exploration, while a stakeholder-informed simulation and conformal wrappers ensure robust uncertainty quantification. The framework yields stakeholder-specific risk maps, Rashomon-set volumes, and disagreement measures, advancing trustworthy deployment by explicitly accounting for variability across usage scenarios and demographic slices. Empirical results from synthetic and LLM experiments demonstrate improved tail-risk detection, multimodality capture, and data efficiency relative to baselines, with practical implications for safety auditing and policy-aligned evaluation.

Abstract

Evaluations of generative AI models often collapse nuanced behaviour into a single number computed for a single decoding configuration. Such point estimates obscure tail risks, demographic disparities, and the existence of multiple near-optimal operating points. We propose a unified framework that embraces multiplicity by modelling the distribution of harmful behaviour across the entire space of decoding knobs and prompts, quantifying risk through tail-focused metrics, and integrating stakeholder preferences. Our technical contributions are threefold: (i) we formalise decoding Rashomon sets, regions of knob space whose risk is near-optimal under given criteria and measure their size and disagreement; (ii) we develop a dependent Dirichlet process (DDP) mixture with stakeholder-conditioned stick-breaking weights to learn multi-modal harm surfaces; and (iii) we introduce an active sampling pipeline that uses Bayesian deep learning surrogates to explore knob space efficiently. Our approach bridges multiplicity theory, Bayesian nonparametrics, and stakeholder-aligned sensitivity analysis, paving the way for trustworthy deployment of generative models.

Embracing Ambiguity: Bayesian Nonparametrics and Stakeholder Participation for Ambiguity-Aware Safety Evaluation

TL;DR

The paper tackles the limitation of single-point risk metrics in evaluating generative models by introducing a multiplicity-aware safety framework. It formalizes decoding Rashomon sets and tail-focused risk functionals, and then proposes a dependent Dirichlet process (DDP) mixture with stakeholder-conditioned weights to model multi-modal harm surfaces across prompts and decoding knobs. An active sampling pipeline with Bayesian deep surrogates guides efficient knob-space exploration, while a stakeholder-informed simulation and conformal wrappers ensure robust uncertainty quantification. The framework yields stakeholder-specific risk maps, Rashomon-set volumes, and disagreement measures, advancing trustworthy deployment by explicitly accounting for variability across usage scenarios and demographic slices. Empirical results from synthetic and LLM experiments demonstrate improved tail-risk detection, multimodality capture, and data efficiency relative to baselines, with practical implications for safety auditing and policy-aligned evaluation.

Abstract

Evaluations of generative AI models often collapse nuanced behaviour into a single number computed for a single decoding configuration. Such point estimates obscure tail risks, demographic disparities, and the existence of multiple near-optimal operating points. We propose a unified framework that embraces multiplicity by modelling the distribution of harmful behaviour across the entire space of decoding knobs and prompts, quantifying risk through tail-focused metrics, and integrating stakeholder preferences. Our technical contributions are threefold: (i) we formalise decoding Rashomon sets, regions of knob space whose risk is near-optimal under given criteria and measure their size and disagreement; (ii) we develop a dependent Dirichlet process (DDP) mixture with stakeholder-conditioned stick-breaking weights to learn multi-modal harm surfaces; and (iii) we introduce an active sampling pipeline that uses Bayesian deep learning surrogates to explore knob space efficiently. Our approach bridges multiplicity theory, Bayesian nonparametrics, and stakeholder-aligned sensitivity analysis, paving the way for trustworthy deployment of generative models.

Paper Structure

This paper contains 31 sections, 18 equations, 1 figure.

Figures (1)

  • Figure 1: Vertical evaluation pipeline. Stage 1 generates outputs using active sampling. Stage 2 defines stakeholder priors and rater profiles. Stage 3 simulates judging and calibrates scores. Stage 4 infers risk surfaces and Rashomon sets.