Causal Strengths and Leaky Beliefs: Interpreting LLM Reasoning via Noisy-OR Causal Bayes Nets
Authors
Hanna Dettki
Abstract
The nature of intelligence in both humans and machines is a longstanding question. While there is no universally accepted definition, the ability to reason causally is often regarded as a pivotal aspect of intelligence (Lake et al., 2017). Evaluating causal reasoning in LLMs and humans on the same tasks provides hence a more comprehensive understanding of their respective strengths and weaknesses. Our study asks: (Q1) Are LLMs aligned with humans given the \emph{same} reasoning tasks? (Q2) Do LLMs and humans reason consistently at the task level? (Q3) Do they have distinct reasoning signatures?
We answer these by evaluating 20+ LLMs on eleven semantically meaningful causal tasks formalized by a collider graph ( ) under \emph{Direct} (one-shot number as response = probability judgment of query node being one and \emph{Chain of Thought} (CoT; think first, then provide answer).
Judgments are modeled with a leaky noisy-OR causal Bayes net (CBN) whose parameters include a shared prior ;
we select the winning model via AIC between a 3-parameter symmetric causal strength () and 4-parameter asymmetric () variant.