Table of Contents
Fetching ...

Concept Incongruence: An Exploration of Time and Death in Role Playing

Xiaoyan Bai, Ike Peng, Aditya Singh, Chenhao Tan

TL;DR

This paper introduces concept incongruence as a framework for understanding how prompts and model representations clash, with a focus on time-bound role-playing. It defines three metrics—abstention rate, conditional accuracy, and answer rate—and builds a death-time Role-Play benchmark using 100 historical figures to study model behavior. Results show models rarely abstain after death and exhibit an accuracy drop in Role-Play, driven by unreliable death-state encoding and shifts in temporal representations. Probing reveals that Role-Play degrades precise death-year encoding and perturbs temporal embeddings, while explicit death-year specification can restore abstention at the cost of overall accuracy. The work highlights fundamental challenges in reconciling role immersion with world knowledge and suggests future directions for specification, clarification, and representation management to mitigate concept incongruence.

Abstract

Consider this prompt "Draw a unicorn with two horns". Should large language models (LLMs) recognize that a unicorn has only one horn by definition and ask users for clarifications, or proceed to generate something anyway? We introduce concept incongruence to capture such phenomena where concept boundaries clash with each other, either in user prompts or in model representations, often leading to under-specified or mis-specified behaviors. In this work, we take the first step towards defining and analyzing model behavior under concept incongruence. Focusing on temporal boundaries in the Role-Play setting, we propose three behavioral metrics--abstention rate, conditional accuracy, and answer rate--to quantify model behavior under incongruence due to the role's death. We show that models fail to abstain after death and suffer from an accuracy drop compared to the Non-Role-Play setting. Through probing experiments, we identify two main causes: (i) unreliable encoding of the "death" state across different years, leading to unsatisfactory abstention behavior, and (ii) role playing causes shifts in the model's temporal representations, resulting in accuracy drops. We leverage these insights to improve consistency in the model's abstention and answer behaviors. Our findings suggest that concept incongruence leads to unexpected model behaviors and point to future directions on improving model behavior under concept incongruence.

Concept Incongruence: An Exploration of Time and Death in Role Playing

TL;DR

This paper introduces concept incongruence as a framework for understanding how prompts and model representations clash, with a focus on time-bound role-playing. It defines three metrics—abstention rate, conditional accuracy, and answer rate—and builds a death-time Role-Play benchmark using 100 historical figures to study model behavior. Results show models rarely abstain after death and exhibit an accuracy drop in Role-Play, driven by unreliable death-state encoding and shifts in temporal representations. Probing reveals that Role-Play degrades precise death-year encoding and perturbs temporal embeddings, while explicit death-year specification can restore abstention at the cost of overall accuracy. The work highlights fundamental challenges in reconciling role immersion with world knowledge and suggests future directions for specification, clarification, and representation management to mitigate concept incongruence.

Abstract

Consider this prompt "Draw a unicorn with two horns". Should large language models (LLMs) recognize that a unicorn has only one horn by definition and ask users for clarifications, or proceed to generate something anyway? We introduce concept incongruence to capture such phenomena where concept boundaries clash with each other, either in user prompts or in model representations, often leading to under-specified or mis-specified behaviors. In this work, we take the first step towards defining and analyzing model behavior under concept incongruence. Focusing on temporal boundaries in the Role-Play setting, we propose three behavioral metrics--abstention rate, conditional accuracy, and answer rate--to quantify model behavior under incongruence due to the role's death. We show that models fail to abstain after death and suffer from an accuracy drop compared to the Non-Role-Play setting. Through probing experiments, we identify two main causes: (i) unreliable encoding of the "death" state across different years, leading to unsatisfactory abstention behavior, and (ii) role playing causes shifts in the model's temporal representations, resulting in accuracy drops. We leverage these insights to improve consistency in the model's abstention and answer behaviors. Our findings suggest that concept incongruence leads to unexpected model behaviors and point to future directions on improving model behavior under concept incongruence.

Paper Structure

This paper contains 15 sections, 1 equation, 10 figures, 8 tables.

Figures (10)

  • Figure 1: An illustration of three levels of incongruence. (A): Impossible to complete without resolving the clash (mis-specification), although ChatGPT proceeds to generate an image; (B): Possible to complete but challenging for the models. It is relatively easy to trace the incongruence because incongruence shows up in the prompt. It could benefit from specification, as in the Marilyn Monroe example (under-specification); (C): Challenging to trace the incongruence because the incongruence does not show up in the prompt. It is also hard to specify the desirable behavior (under-specification).
  • Figure 2: After-death abstention/answer patterns in the Role-Play setting deviate substantially from the expected behavior: Llama shows an 81.3% deviation and Claude has a 90.4% deviation from expected abstention rate. Additionally, Llama, Gemma, and GPT-4.1 all exhibit a drop in accuracy. All the differences are significant with $p < 0.001$ using $t$-test after Bonferroni correction, except for Claude’s accuracy and Gemma’s before-death answer rate (also statistically significant, $p < 0.05$).
  • Figure 2: Correlation and RMSE worsen in the Role-Play mode ($p < 0.001$ for Corr. and RMSE using $t$-test).
  • Figure 3: Different from the expected sharp shift, abstention rate and answer rate gradually change around death time for Llama and Claude.
  • Figure 4: (a) shows that death state is not reliably encoded in the Role-Play mode. (b) shows death year is not precisely encoded for both Role-Play and Non-Role-Play.
  • ...and 5 more figures