Table of Contents
Fetching ...

Behavioural feasible set: Value alignment constraints on AI decision support

Taejin Park

Abstract

When organisations adopt commercial AI systems for decision support, they inherit value judgements embedded by vendors that are neither transparent nor renegotiable. The governance puzzle is not whether AI can support decisions but which recommendations the system can actually produce given how its vendor has configured it. I formalise this as a behavioural feasible set, the range of recommendations reachable under vendor-imposed alignment constraints, and characterise diagnostic thresholds for when organisational requirements exceed the system's flexibility. In scenario-based experiments using binary decision scenarios and multi-stakeholder ranking tasks, I show that alignment materially compresses this set. Comparing pre- and post-alignment variants of an open-weight model isolates the mechanism: alignment makes the system substantially less able to shift its recommendation even under legitimate contextual pressure. Leading commercial models exhibit comparable or greater rigidity. In multi-stakeholder tasks, alignment shifts implied stakeholder priorities rather than neutralising them, meaning organisations adopt embedded value orientations set upstream by the vendor. Organisations thus face a governance problem that better prompting cannot resolve: selecting a vendor partially determines which trade-offs remain negotiable and which stakeholder priorities are structurally embedded.

Behavioural feasible set: Value alignment constraints on AI decision support

Abstract

When organisations adopt commercial AI systems for decision support, they inherit value judgements embedded by vendors that are neither transparent nor renegotiable. The governance puzzle is not whether AI can support decisions but which recommendations the system can actually produce given how its vendor has configured it. I formalise this as a behavioural feasible set, the range of recommendations reachable under vendor-imposed alignment constraints, and characterise diagnostic thresholds for when organisational requirements exceed the system's flexibility. In scenario-based experiments using binary decision scenarios and multi-stakeholder ranking tasks, I show that alignment materially compresses this set. Comparing pre- and post-alignment variants of an open-weight model isolates the mechanism: alignment makes the system substantially less able to shift its recommendation even under legitimate contextual pressure. Leading commercial models exhibit comparable or greater rigidity. In multi-stakeholder tasks, alignment shifts implied stakeholder priorities rather than neutralising them, meaning organisations adopt embedded value orientations set upstream by the vendor. Organisations thus face a governance problem that better prompting cannot resolve: selecting a vendor partially determines which trade-offs remain negotiable and which stakeholder priorities are structurally embedded.
Paper Structure (53 sections, 23 equations, 4 figures, 8 tables)

This paper contains 53 sections, 23 equations, 4 figures, 8 tables.

Figures (4)

  • Figure 1: Reversal threshold in the binary action space. (a) Reversal threshold $\kappa_{\mathrm{rev}}(x)$ against baseline probability $p_0(x)$. (b) Required budget for stricter targets $p^\dagger < 1/2$ at fixed $p_0 = 0.90$.
  • Figure 2: Stakeholder $\varepsilon$-balancing threshold (Pinsker outer bound). (a) Diagnostic lower bound $\kappa_{\mathrm{bal}}(I_0; \varepsilon)$ against baseline imbalance $I_0(x)$; for each $\varepsilon$, the region below its curve marks where $\varepsilon$-balance is not guaranteed to be reachable. (b)--(c) Simplex schematics: the blue region is the Pinsker $\ell_1$ outer bound ($\|p - p_0\|_1 \leq \sqrt{2\kappa}$); the dashed green region is the $\varepsilon$-balance set around uniform $u$ (circles are Euclidean proxies for $\ell_1$ balls, shown for qualitative illustration).
  • Figure 3: Effect of alignment on stakeholder priority priors. (a) displays mean Borda weights for each stakeholder, where rank 1 receives 5 points and rank 5 receives 1 point, normalised to sum to 1. The dashed pentagon indicates uniform weighting (0.20 per stakeholder). (b) reports the difference in Borda weights (Llama Instruct minus Llama Base). Error bars represent 95% confidence intervals computed via paired bootstrap resampling: for each of 1,000 iterations, (scenario, sample) pairs are resampled with replacement, maintaining the pairing between models, and the difference in mean Borda weights is computed. $n = 400$ observations per model ($8$ scenarios $\times$$50$ samples).
  • Figure 4: Alignment-induced stakeholder priority shift by scenario. Bars show the difference in mean Borda weights (Llama Instruct minus Llama Base) for each stakeholder within each scenario. Positive values indicate higher priority after alignment; negative values indicate lower priority. The pattern is consistent across scenarios: Shareholders lose priority while Customers and Employees gain, indicating that the aggregate shift in Figure \ref{['fig:stakeholder']} reflects a systematic transformation rather than scenario-specific effects. $n = 50$ observations per model per scenario.