GAIN: A Benchmark for Goal-Aligned Decision-Making of Large Language Models under Imperfect Norms

Masayuki Kawarada; Kodai Watanabe; Soichiro Murakami

GAIN: A Benchmark for Goal-Aligned Decision-Making of Large Language Models under Imperfect Norms

Masayuki Kawarada, Kodai Watanabe, Soichiro Murakami

Abstract

We introduce GAIN (Goal-Aligned Decision-Making under Imperfect Norms), a benchmark designed to evaluate how large language models (LLMs) balance adherence to norms against business goals. Existing benchmarks typically focus on abstract scenarios rather than real-world business applications. Furthermore, they provide limited insights into the factors influencing LLM decision-making. This restricts their ability to measure models' adaptability to complex, real-world norm-goal conflicts. In GAIN, models receive a goal, a specific situation, a norm, and additional contextual pressures. These pressures, explicitly designed to encourage potential norm deviations, are a unique feature that differentiates GAIN from other benchmarks, enabling a systematic evaluation of the factors influencing decision-making. We define five types of pressures: Goal Alignment, Risk Aversion, Emotional/Ethical Appeal, Social/Authoritative Influence, and Personal Incentive. The benchmark comprises 1,200 scenarios across four domains: hiring, customer support, advertising and finance. Our experiments show that advanced LLMs frequently mirror human decision-making patterns. However, when Personal Incentive pressure is present, they diverge significantly, showing a strong tendency to adhere to norms rather than deviate from them.

GAIN: A Benchmark for Goal-Aligned Decision-Making of Large Language Models under Imperfect Norms

Abstract

Paper Structure (27 sections, 2 equations, 4 figures, 3 tables)

This paper contains 27 sections, 2 equations, 4 figures, 3 tables.

Introduction
Related Work
Benchmarks for Evaluating Decision-Making in Large Language Models.
Human Decision-Making under Imperfect Norms.
Task Definition
Benchmark Creation
Pressure Design and Categorization
Data Generation
Base Scenario Generation.
Pressure Generation.
Human Baseline and Task Analysis
Benchmark Quality Validation
Human Baseline Data Collection
Task Ambiguity and Inter-Annotator Agreement
Experiments
...and 12 more sections

Figures (4)

Figure 1: Example of goal-aligned decision-making under imperfect norms. An employee approves a product exchange that is not permitted by norms. This decision is made as an exception, prioritizing long-term customer loyalty after carefully considering various pressures present in the business situation.
Figure 2: Overview of our data generation pipeline.
Figure 3: Prompt for the decision-making task using the GAIN benchmark.
Figure 4: Comparison of action choice distributions across scenario types.

GAIN: A Benchmark for Goal-Aligned Decision-Making of Large Language Models under Imperfect Norms

Abstract

GAIN: A Benchmark for Goal-Aligned Decision-Making of Large Language Models under Imperfect Norms

Authors

Abstract

Table of Contents

Figures (4)