Table of Contents
Fetching ...

The Role of the Availability Heuristic in Multiple-Choice Answering Behaviour

Leonidas Zotos, Hedderik van Rijn, Malvina Nissim

Abstract

When students are unsure of the correct answer to a multiple-choice question (MCQ), guessing is common practice. The availability heuristic, proposed by A. Tversky and D. Kahneman in 1973, suggests that the ease with which relevant instances come to mind, typically operationalised by the mere frequency of exposure, can offer a mental shortcut for problems in which the test-taker does not know the exact answer. Is simply choosing the option that comes most readily to mind a good strategy for answering MCQs? We propose a computational method of assessing the cognitive availability of MCQ options operationalised by concepts' prevalence in large corpora. The key finding, across three large question sets, is that correct answers, independently of the question stem, are significantly more available than incorrect MCQ options. Specifically, using Wikipedia as the retrieval corpus, we find that always selecting the most available option leads to scores 13.5% to 32.9% above the random-guess baseline. We further find that LLM-generated MCQ options show similar patterns of availability compared to expert-created options, despite the LLMs' frequentist nature and their training on large collections of textual data. Our findings suggest that availability should be considered in current and future work when computationally modelling student behaviour.

The Role of the Availability Heuristic in Multiple-Choice Answering Behaviour

Abstract

When students are unsure of the correct answer to a multiple-choice question (MCQ), guessing is common practice. The availability heuristic, proposed by A. Tversky and D. Kahneman in 1973, suggests that the ease with which relevant instances come to mind, typically operationalised by the mere frequency of exposure, can offer a mental shortcut for problems in which the test-taker does not know the exact answer. Is simply choosing the option that comes most readily to mind a good strategy for answering MCQs? We propose a computational method of assessing the cognitive availability of MCQ options operationalised by concepts' prevalence in large corpora. The key finding, across three large question sets, is that correct answers, independently of the question stem, are significantly more available than incorrect MCQ options. Specifically, using Wikipedia as the retrieval corpus, we find that always selecting the most available option leads to scores 13.5% to 32.9% above the random-guess baseline. We further find that LLM-generated MCQ options show similar patterns of availability compared to expert-created options, despite the LLMs' frequentist nature and their training on large collections of textual data. Our findings suggest that availability should be considered in current and future work when computationally modelling student behaviour.
Paper Structure (17 sections, 4 figures, 4 tables)

This paper contains 17 sections, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Overview of our approach of evaluating the out-of-context availability of an MCQ option. First, we compute the textual embedding of the combined MCQ options. Then, a pre-determined number of relevant passages are retrieved from one of two large corpora. Afterwards, the textual embedding of each option is computed separately. The per-option availability is finally determined by the proportion of retrieved passages that are most similar to each option.
  • Figure 2: LLM instruction used to generate n alternative distractors. The model is instructed to output the distractors in a "boxed" environment, separated by vertical bars to facilitate automatic extraction. We empirically find that repetition in the instruction leads to better instruction-following, especially for the less-capable Qwen3-8b.
  • Figure 3: Average out-of-context availability per option. Distractors are ordered based on student selection rates, with the exception of SciQ, where student rates are not available. For each set of options, 20 relevant passages are retrieved which are assigned based on similarity to one of the options. Passages are retrieved either from Wikipedia or the BEIR corpus. Statistically significant differences are marked with asterisks (* $p < 0.01$, ** $p < 0.005$, *** $p < 0.001$) and displayed alongside effect sizes.
  • Figure 4: Average out-of-context availability per distractor generation method. Distractors are either human- or LLM-generated. Passages are retrieved from the Wikipedia corpus. Error bars represent 95% High Density Intervals. Statistically significant differences are marked based on the probability of direction, representing the certainty that the correct answer is more available than the distractors (* $pd>95\%$, *** $pd>99$%).