Table of Contents
Fetching ...

Shades of Zero: Distinguishing Impossibility from Inconceivability

Jennifer Hu, Felix Sosa, Tomer Ullman

TL;DR

The paper investigates whether impossibility and inconceivability constitute distinct modal categories or lie on a single graded axis. Through three experiments, it shows that people can reliably distinguish impossible from inconceivable items, yet rate both near zero in likelihood, suggesting a distinction beyond simple probability thresholds. Language models, in contrast, separate these categories via string probabilities and, on average, align with human likelihood ratings across categories, though they differ from humans on the impossibility–inconceivability boundary within the near-zero region. The findings imply that linguistic statistical learning captures fine-grained, rare-event knowledge that humans may access through different, possibly category-based computations, with implications for understanding human modal reasoning and the limits of current AI language models.

Abstract

Some things are impossible, but some things may be even more impossible than impossible. Levitating a feather using one's mind is impossible in our world, but fits into our intuitive theories of possible worlds, whereas levitating a feather using the number five cannot be conceived in any possible world ("inconceivable"). While prior work has examined the distinction between improbable and impossible events, there has been little empirical research on inconceivability. Here, we investigate whether people maintain a distinction between impossibility and inconceivability, and how such distinctions might be made. We find that people can readily distinguish the impossible from the inconceivable, using categorization studies similar to those used to investigate the differences between impossible and improbable (Experiment 1). However, this distinction is not explained by people's subjective ratings of event likelihood, which are near zero and indistinguishable between impossible and inconceivable event descriptions (Experiment 2). Finally, we ask whether the probabilities assigned to event descriptions by statistical language models (LMs) can be used to separate modal categories, and whether these probabilities align with people's ratings (Experiment 3). We find high-level similarities between people and LMs: both distinguish among impossible and inconceivable event descriptions, and LM-derived string probabilities predict people's ratings of event likelihood across modal categories. Our findings suggest that fine-grained knowledge about exceedingly rare events (i.e., the impossible and inconceivable) may be learned via statistical learning over linguistic forms, yet leave open the question of whether people represent the distinction between impossible and inconceivable as a difference not of degree, but of kind.

Shades of Zero: Distinguishing Impossibility from Inconceivability

TL;DR

The paper investigates whether impossibility and inconceivability constitute distinct modal categories or lie on a single graded axis. Through three experiments, it shows that people can reliably distinguish impossible from inconceivable items, yet rate both near zero in likelihood, suggesting a distinction beyond simple probability thresholds. Language models, in contrast, separate these categories via string probabilities and, on average, align with human likelihood ratings across categories, though they differ from humans on the impossibility–inconceivability boundary within the near-zero region. The findings imply that linguistic statistical learning captures fine-grained, rare-event knowledge that humans may access through different, possibly category-based computations, with implications for understanding human modal reasoning and the limits of current AI language models.

Abstract

Some things are impossible, but some things may be even more impossible than impossible. Levitating a feather using one's mind is impossible in our world, but fits into our intuitive theories of possible worlds, whereas levitating a feather using the number five cannot be conceived in any possible world ("inconceivable"). While prior work has examined the distinction between improbable and impossible events, there has been little empirical research on inconceivability. Here, we investigate whether people maintain a distinction between impossibility and inconceivability, and how such distinctions might be made. We find that people can readily distinguish the impossible from the inconceivable, using categorization studies similar to those used to investigate the differences between impossible and improbable (Experiment 1). However, this distinction is not explained by people's subjective ratings of event likelihood, which are near zero and indistinguishable between impossible and inconceivable event descriptions (Experiment 2). Finally, we ask whether the probabilities assigned to event descriptions by statistical language models (LMs) can be used to separate modal categories, and whether these probabilities align with people's ratings (Experiment 3). We find high-level similarities between people and LMs: both distinguish among impossible and inconceivable event descriptions, and LM-derived string probabilities predict people's ratings of event likelihood across modal categories. Our findings suggest that fine-grained knowledge about exceedingly rare events (i.e., the impossible and inconceivable) may be learned via statistical learning over linguistic forms, yet leave open the question of whether people represent the distinction between impossible and inconceivable as a difference not of degree, but of kind.

Paper Structure

This paper contains 23 sections, 6 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Classification results from Experiment 1. (A) Proportion of trials where human responses matched the underlying condition coding. Dashed line indicates chance performance (25%). Error bars indicate bootstrapped 95% CI. (B) Distribution of responses within each condition. Highlighted cells indicate proportion of trials where responses match the condition coding (as shown in (A)).
  • Figure 2: Subjective ratings of event likelihood from Experiment 2. (A) Raw and (B) normalized (within-item) ratings, averaged over items in each condition. Error bars denote bootstrapped 95% CIs.
  • Figure 3: (A) Mean surprisal values (i.e., negative log probability averaged over tokens) assigned by our tested language models to continuations in each condition. (B) Log $n$-gram count of continuations in each condition, estimated based on Llama-2 tokenization of Dolma-v1.7 liu_infini-gram_2024. Error bars denote bootstrapped 95% CIs.
  • Figure 4: Mean surprisal values assigned to continuations in each condition by 10 intermediate checkpoints of OLMo-7B. Error bands indicate bootstrapped 95% CIs.
  • Figure 5: Human ratings (y-axis) versus surprisal assigned by language models to continuations in each condition (x-axis). Subplots are annotated with Pearson $r$ correlation coefficients. While the linear correlation captures the overall trend, it does not capture the relevant distinction between impossible (yellow x's) and inconceivable (red diamonds).
  • ...and 1 more figures