Table of Contents
Fetching ...

Seeing the Goal, Missing the Truth: Human Accountability for AI Bias

Sean Cao, Wei Jiang, Hui Xu

TL;DR

This paper investigates whether revealing the downstream use of LLM outputs induces goal-conditioned distortions in intermediate signals. Using earnings-call transcripts, the authors compare goal-blind and goal-aware prompts to generate sentiment and competition scores that feed standard econometric forecasts of stock returns and EPS. They find that goal awareness improves in-sample predictive content before the model's knowledge cutoff but does not improve out-of-sample generalization after the cutoff, suggesting the effects arise from objective-conditioned optimization rather than genuine signal enhancement. The work highlights a human-centered channel of AI bias and emphasizes designing measurement workflows that remain agnostic to downstream tasks and rigorously tested out-of-sample to preserve credibility and robustness in AI-assisted research.

Abstract

This research explores how human-defined goals influence the behavior of Large Language Models (LLMs) through purpose-conditioned cognition. Using financial prediction tasks, we show that revealing the downstream use (e.g., predicting stock returns or earnings) of LLM outputs leads the LLM to generate biased sentiment and competition measures, even though these measures are intended to be downstream task-independent. Goal-aware prompting shifts intermediate measures toward the disclosed downstream objective. This purpose leakage improves performance before the LLM's knowledge cutoff, but with no advantage post-cutoff. AI bias due to "seeing the goal" is not an algorithmic flaw, but stems from human accountability in research design to ensure the statistical validity and reliability of AI-generated measurements.

Seeing the Goal, Missing the Truth: Human Accountability for AI Bias

TL;DR

This paper investigates whether revealing the downstream use of LLM outputs induces goal-conditioned distortions in intermediate signals. Using earnings-call transcripts, the authors compare goal-blind and goal-aware prompts to generate sentiment and competition scores that feed standard econometric forecasts of stock returns and EPS. They find that goal awareness improves in-sample predictive content before the model's knowledge cutoff but does not improve out-of-sample generalization after the cutoff, suggesting the effects arise from objective-conditioned optimization rather than genuine signal enhancement. The work highlights a human-centered channel of AI bias and emphasizes designing measurement workflows that remain agnostic to downstream tasks and rigorously tested out-of-sample to preserve credibility and robustness in AI-assisted research.

Abstract

This research explores how human-defined goals influence the behavior of Large Language Models (LLMs) through purpose-conditioned cognition. Using financial prediction tasks, we show that revealing the downstream use (e.g., predicting stock returns or earnings) of LLM outputs leads the LLM to generate biased sentiment and competition measures, even though these measures are intended to be downstream task-independent. Goal-aware prompting shifts intermediate measures toward the disclosed downstream objective. This purpose leakage improves performance before the LLM's knowledge cutoff, but with no advantage post-cutoff. AI bias due to "seeing the goal" is not an algorithmic flaw, but stems from human accountability in research design to ensure the statistical validity and reliability of AI-generated measurements.
Paper Structure (14 sections, 4 equations, 3 figures, 5 tables)

This paper contains 14 sections, 4 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Cumulative Long–Short Portfolio Returns from Goal-Aware and Goal-Blind Sentiment Scores
  • Figure 2: Monthly Out-of-Sample Forecast Accuracy Using Goal-Aware and Goal-Blind Sentiment Scores
  • Figure 3: Quarterly Out-of-Sample Forecast Accuracy Using GPT-Derived Competition Scores