Table of Contents
Fetching ...

Poor Alignment and Steerability of Large Language Models: Evidence from College Admission Essays

Jinsook Lee, AJ Alvero, Thorsten Joachims, René Kizilcec

TL;DR

The paper investigates how poorly current large language models align with and can be steered toward human writing in high-stakes college admissions essays. By comparing 29,232 human-authored essays with two LLM-generated variants (prompted by the essay question alone and by additional demographic prompts) across eight models, the authors employ embedding-based analyses, PCA visualization, cosine similarity, and TF-IDF–based classification. They find persistent misalignment: AI essays remain linguistically distinct from human texts, identity prompting yields limited or even adverse effects, and prompted AI outputs remain more homogenized with each other than with human writing. The findings raise concerns about the use of LLMs in high-stakes evaluation and emphasize the need for stronger alignment and careful consideration of steerability via prompting, model tuning, and detection methods.

Abstract

People are increasingly using technologies equipped with large language models (LLM) to write texts for formal communication, which raises two important questions at the intersection of technology and society: Who do LLMs write like (model alignment); and can LLMs be prompted to change who they write like (model steerability). We investigate these questions in the high-stakes context of undergraduate admissions at a selective university by comparing lexical and sentence variation between essays written by 30,000 applicants to two types of LLM-generated essays: one prompted with only the essay question used by the human applicants; and another with additional demographic information about each applicant. We consistently find that both types of LLM-generated essays are linguistically distinct from human-authored essays, regardless of the specific model and analytical approach. Further, prompting a specific sociodemographic identity is remarkably ineffective in aligning the model with the linguistic patterns observed in human writing from this identity group. This holds along the key dimensions of sex, race, first-generation status, and geographic location. The demographically prompted and unprompted synthetic texts were also more similar to each other than to the human text, meaning that prompting did not alleviate homogenization. These issues of model alignment and steerability in current LLMs raise concerns about the use of LLMs in high-stakes contexts.

Poor Alignment and Steerability of Large Language Models: Evidence from College Admission Essays

TL;DR

The paper investigates how poorly current large language models align with and can be steered toward human writing in high-stakes college admissions essays. By comparing 29,232 human-authored essays with two LLM-generated variants (prompted by the essay question alone and by additional demographic prompts) across eight models, the authors employ embedding-based analyses, PCA visualization, cosine similarity, and TF-IDF–based classification. They find persistent misalignment: AI essays remain linguistically distinct from human texts, identity prompting yields limited or even adverse effects, and prompted AI outputs remain more homogenized with each other than with human writing. The findings raise concerns about the use of LLMs in high-stakes evaluation and emphasize the need for stronger alignment and careful consideration of steerability via prompting, model tuning, and detection methods.

Abstract

People are increasingly using technologies equipped with large language models (LLM) to write texts for formal communication, which raises two important questions at the intersection of technology and society: Who do LLMs write like (model alignment); and can LLMs be prompted to change who they write like (model steerability). We investigate these questions in the high-stakes context of undergraduate admissions at a selective university by comparing lexical and sentence variation between essays written by 30,000 applicants to two types of LLM-generated essays: one prompted with only the essay question used by the human applicants; and another with additional demographic information about each applicant. We consistently find that both types of LLM-generated essays are linguistically distinct from human-authored essays, regardless of the specific model and analytical approach. Further, prompting a specific sociodemographic identity is remarkably ineffective in aligning the model with the linguistic patterns observed in human writing from this identity group. This holds along the key dimensions of sex, race, first-generation status, and geographic location. The demographically prompted and unprompted synthetic texts were also more similar to each other than to the human text, meaning that prompting did not alleviate homogenization. These issues of model alignment and steerability in current LLMs raise concerns about the use of LLMs in high-stakes contexts.

Paper Structure

This paper contains 52 sections, 4 equations, 10 figures, 6 tables.

Figures (10)

  • Figure 1: Two-dimensional PCA projection of T5 sentence embeddings comparing human-authored essays (blue) with LLM-generated essays under default (green) and identity-prompted (orange) conditions across multiple models. The clustering patterns indicate a clear distinction between human and machine-generated texts, with identity prompting failing to align LLM outputs more closely with human writing.
  • Figure 2: Pairwise cosine similarity between different pairs of essays according to their authorship, broken down by models. Each dot represents the average pairwise cosine similarity for a given model with 95% confidence intervals (not visible due to small size).
  • Figure 3: Coefficients from logistic regression classifiers comparing word usage patterns in human-authored and LLM-generated college application essays. The left and middle panels contrast LLM-generated essays (default and identity-prompted conditions) with human-written essays which shows that LLMs favor high-level conceptual terms (e.g., challenge, growth, understanding) while human-authored texts include more concrete, personal, and temporal references (e.g., year, time, friend). The right panel compares identity-prompted LLM outputs with default LLM outputs, indicating that identity prompting leads to increased usage of demographic and background-related terms (e.g., parent, immigrant, first-generation) but does not shift the LLMs toward more human-like storytelling.
  • Figure 4: Pairwise cosine similarity distributions between human-authored and LLM-generated essays, broken down by demographic groups. Each dot represents the average similarity for a given subgroup across different comparison types with 95% CI: Human vs. Human (red), LLM vs.Human (blue), LLM ID-Prompted vs. Human (orange), and LLM vs. LLM ID-Prompted (green). The results indicate that while LLMs produce internally consistent outputs, their alignment with human-authored texts remains lower, even if when identity prompts are applied.
  • Figure 5: PCA projection of bag-of-words (TF-IDF) encoding comparing human-authored essays (blue) with LLM-generated essays (green) and identity-prompted (orange) conditions across models.
  • ...and 5 more figures