Table of Contents
Fetching ...

GAIN: A Benchmark for Goal-Aligned Decision-Making of Large Language Models under Imperfect Norms

Masayuki Kawarada, Kodai Watanabe, Soichiro Murakami

Abstract

We introduce GAIN (Goal-Aligned Decision-Making under Imperfect Norms), a benchmark designed to evaluate how large language models (LLMs) balance adherence to norms against business goals. Existing benchmarks typically focus on abstract scenarios rather than real-world business applications. Furthermore, they provide limited insights into the factors influencing LLM decision-making. This restricts their ability to measure models' adaptability to complex, real-world norm-goal conflicts. In GAIN, models receive a goal, a specific situation, a norm, and additional contextual pressures. These pressures, explicitly designed to encourage potential norm deviations, are a unique feature that differentiates GAIN from other benchmarks, enabling a systematic evaluation of the factors influencing decision-making. We define five types of pressures: Goal Alignment, Risk Aversion, Emotional/Ethical Appeal, Social/Authoritative Influence, and Personal Incentive. The benchmark comprises 1,200 scenarios across four domains: hiring, customer support, advertising and finance. Our experiments show that advanced LLMs frequently mirror human decision-making patterns. However, when Personal Incentive pressure is present, they diverge significantly, showing a strong tendency to adhere to norms rather than deviate from them.

GAIN: A Benchmark for Goal-Aligned Decision-Making of Large Language Models under Imperfect Norms

Abstract

We introduce GAIN (Goal-Aligned Decision-Making under Imperfect Norms), a benchmark designed to evaluate how large language models (LLMs) balance adherence to norms against business goals. Existing benchmarks typically focus on abstract scenarios rather than real-world business applications. Furthermore, they provide limited insights into the factors influencing LLM decision-making. This restricts their ability to measure models' adaptability to complex, real-world norm-goal conflicts. In GAIN, models receive a goal, a specific situation, a norm, and additional contextual pressures. These pressures, explicitly designed to encourage potential norm deviations, are a unique feature that differentiates GAIN from other benchmarks, enabling a systematic evaluation of the factors influencing decision-making. We define five types of pressures: Goal Alignment, Risk Aversion, Emotional/Ethical Appeal, Social/Authoritative Influence, and Personal Incentive. The benchmark comprises 1,200 scenarios across four domains: hiring, customer support, advertising and finance. Our experiments show that advanced LLMs frequently mirror human decision-making patterns. However, when Personal Incentive pressure is present, they diverge significantly, showing a strong tendency to adhere to norms rather than deviate from them.
Paper Structure (27 sections, 2 equations, 4 figures, 3 tables)

This paper contains 27 sections, 2 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Example of goal-aligned decision-making under imperfect norms. An employee approves a product exchange that is not permitted by norms. This decision is made as an exception, prioritizing long-term customer loyalty after carefully considering various pressures present in the business situation.
  • Figure 2: Overview of our data generation pipeline.
  • Figure 3: Prompt for the decision-making task using the GAIN benchmark.
  • Figure 4: Comparison of action choice distributions across scenario types.