Table of Contents
Fetching ...

Possible, Yes; Ignorant, Perhaps: A Scorecard for Possibilistic Forecasts

John R. Lawson

Abstract

Probabilistic forecasts must sum to unity and cannot express ``I don't know.'' Possibility theory relaxes this constraint: a subnormal distribution explicitly measures how much of the plausibility budget remains unassigned, ignorance signal that probability cannot represent. This paper develops a verification framework for such forecasts, centred on a five-number scorecard that separately diagnoses whether the forecast pointed at the right outcome (depth-of-truth), how sharply (diffuseness, support margin), how confidently (ignorance), and how dominantly (conditional necessity). A possibility-to-probability conversion preserves ignorance for familiar frequency-based scoring; categorical threshold scores (POD, FAR, CSI, etc.) connect to operational practice. Together, these three complementary facets -- possibilistic, probabilistic, and categorical -- expose failure modes invisible to any single metric. Storm Prediction Center convective outlook categories serve as the running example throughout; a synthetic reforecast demonstrates diagnostic visualisations and scorecard interpretation. Ignorance is better expressed than repressed.

Possible, Yes; Ignorant, Perhaps: A Scorecard for Possibilistic Forecasts

Abstract

Probabilistic forecasts must sum to unity and cannot express ``I don't know.'' Possibility theory relaxes this constraint: a subnormal distribution explicitly measures how much of the plausibility budget remains unassigned, ignorance signal that probability cannot represent. This paper develops a verification framework for such forecasts, centred on a five-number scorecard that separately diagnoses whether the forecast pointed at the right outcome (depth-of-truth), how sharply (diffuseness, support margin), how confidently (ignorance), and how dominantly (conditional necessity). A possibility-to-probability conversion preserves ignorance for familiar frequency-based scoring; categorical threshold scores (POD, FAR, CSI, etc.) connect to operational practice. Together, these three complementary facets -- possibilistic, probabilistic, and categorical -- expose failure modes invisible to any single metric. Storm Prediction Center convective outlook categories serve as the running example throughout; a synthetic reforecast demonstrates diagnostic visualisations and scorecard interpretation. Ignorance is better expressed than repressed.

Paper Structure

This paper contains 35 sections, 20 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 1: Anatomy of a subnormal possibility distribution over SPC categories. Purple bars show raw possibility $\pi(\omega)$. The dashed line marks $\Pi_{\max} = \max(\pi)$; the gap from $\Pi_{\max}$ to $1.0$ is the ignorance $H_{\Pi}$ (Eq. \ref{['eq:ignorance']}). Conditional necessity $\mathrm{N}_{\mathrm{c}}$ for the peak category measures how strongly that category dominates the runner-up after normalisation (Eq. \ref{['eq:cond_nec']}); the annotated value $0.733$ reflects $1 - 0.20/0.75$, where $0.20$ is the runner-up (ENH) and $0.75$ is the peak (MDT).
  • Figure 2: Possibility-to-probability conversion applied to a subnormal distribution. (a) Raw possibility values $\pi(\omega)$ for six SPC categories. (b) Resulting $(n{+}1)$-category probability vector, with the grey bar representing the explicit ignorance outcome $p_{\mathrm{ign}} = H_{\Pi}$. Arrows show proportional redistribution of the remaining $(1 - H_{\Pi})$ mass. The ignorance outcome absorbs probability mass that simple normalisation would have spread across all categories, preserving the subnormality signal.
  • Figure 3: Information-gain decomposition for five forecast archetypes (illustrative values). Purple bars show gross discrimination (DSC); semi-transparent green overlays show the reliability penalty (REL) consuming DSC from the top. Net $\mathrm{IG} = \mathrm{DSC} - \mathrm{REL}$ is marked by the horizontal tick on each bar; the zero baseline represents climatological performance. When REL exceeds DSC (Sharp Wrong, Hedged Wrong), the green extends below zero: the calibration tax consumed more skill than added through discrimination. DSC and REL are sample-aggregated quantities decomposed from a verification subset dominated by each forecast archetype.
  • Figure 4: Contingency-table schematic. (a) Binary $2 \times 2$ table at a single severity threshold, annotated with the five categorical scores (Eqs. \ref{['eq:pod']}--\ref{['eq:hss']}). (b) Extension to the $K \times K$ multi-category setting (SPC categories). Green boxes mark the diagonal (correct peak category); shaded cells mark $\pm 1$ near-misses. HSS (Eq. \ref{['eq:hss_kxk']}) summarises full-table agreement beyond chance. The dashed bracket illustrates how each severity threshold $t$ collapses the $K \times K$ table into a binary problem.
  • Figure 5: Possibilistic performance diagram for the 800-day synthetic reforecast, inspired by Roebber2009-rv. Hexagons bin in Cartesian space where colour denotes mean $H_{\Pi}$ (dark purple = confident, pale = uncertain). Contours join a joint skill $S = \alpha^* \times (1 - \eta)$, the possibilistic analogue of CSI, where $S$ is maximised in the upper right (sharp and truthful). The dashed diagonal traces $\delta = 0$, a break-even threshold; above this line the forecast discriminated and offered more support than a categorical average. The green circles show category means (dot size $\propto$ sample count), showing forecast quality as a function of observed category, tracing the progression from easy categories (upper right) to rare ones (lower left). Stars mark worked scenarios from Section \ref{['sec:worked_examples']} (green edge = hit, red = miss). Lower right is the worst failure mode (sharp and wrong).
  • ...and 6 more figures