Competence-Based Analysis of Language Models
Adam Davies, Jize Jiang, ChengXiang Zhai
TL;DR
Competence-Based Analysis of Language Models (CALM) introduces a causality-inspired framework to quantify how linguistically interpretable representations underpin LLM behavior. By formalizing tasks with structural causal models and applying gradient-based interventions (GBIs) on internal representations, CALM measures a model's linguistic competence as alignment with ground-truth causal structure under interventions. The paper develops a novel Interchange Interventions–style metric C_T(M|G_T) and demonstrates GBIs on BERT and RoBERTa across 14 lexical-inference tasks from the LAMA ConceptNet suite, showing that competence, not just accuracy, clarifies model brittleness and generalization under distribution shifts. These insights offer a principled path to diagnose robustness, guide model design, and anticipate behavior under prompt variation, with broad implications for interpretable AI and reliable language understanding.
Abstract
Despite the recent successes of large, pretrained neural language models (LLMs), comparatively little is known about the representations of linguistic structure they learn during pretraining, which can lead to unexpected behaviors in response to prompt variation or distribution shift. To better understand these models and behaviors, we introduce a general model analysis framework to study LLMs with respect to their representation and use of human-interpretable linguistic properties. Our framework, CALM (Competence-based Analysis of Language Models), is designed to investigate LLM competence in the context of specific tasks by intervening on models' internal representations of different linguistic properties using causal probing, and measuring models' alignment under these interventions with a given ground-truth causal model of the task. We also develop a new approach for performing causal probing interventions using gradient-based adversarial attacks, which can target a broader range of properties and representations than prior techniques. Finally, we carry out a case study of CALM using these interventions to analyze and compare LLM competence across a variety of lexical inference tasks, showing that CALM can be used to explain behaviors across these tasks.
