Reconciling common source, specific source, feature based and score based likelihood ratios

Aafko Boonstra; Ronald Meester; Klaas Slooten

Reconciling common source, specific source, feature based and score based likelihood ratios

Aafko Boonstra, Ronald Meester, Klaas Slooten

TL;DR

This paper formalizes a general Bayesian decision framework showing that the expected cost of decisions, given priors, cannot increase when additional information is incorporated via likelihood ratios. It provides a direct, information-theoretic proof that more information (full data or richer processing) improves decision quality, applicable to both score-based and feature-based LRs and to common-source versus specific-source contexts. Through a DNA kinship example, it demonstrates that LR systems differ only in the information processed, and that scores can be informative when full data are unavailable. The work argues against discarding score-based methods, emphasizes proper conditioning on source parameters, and offers a unified view that reconciles debates about LR systems with practical forensic decision-making. Key contributions include (i) a general inequality ${\mathbb{E}}[c(\pi(E))]\le c(\pi)$ for Bayes costs, (ii) a detailed two-hypotheses/two-actions illustration, (iii) extension to the general case, (iv) an empirical DNA kinship example, and (v) a critical analysis clarifying misunderstandings in the score-based versus common/specific-source debates, all within an information-theoretic framework.

Abstract

We show that the incorporation of any new piece of information allows for improved decision making in the sense that the expected costs of an optimal decision decrease (or, in boundary cases where no or not enough new information is incorporated, stays the same) whenever this is done by the appropriate update of the probabilities of the hypotheses. Versions of this result have been stated before. However, previous proofs rely on auxiliary constructions with proper scoring rules. We, instead, offer a direct and completely general proof by considering elementary properties of likelihood ratios only. We apply our results to make a contribution to the debates about the use of score based/feature based and common/specific source likelihood ratios. In the literature these are often presented as different ``LR-systems''. We argue that the difference between these is simply a matter which information is processed. There is no therefore no such thing as different ``LR-systems'', there are only differences in the processed information. In particular, despite claims to the contrary, scores can very well be used in forensic practice and we illustrate this with an extensive example in DNA kinship context.

Reconciling common source, specific source, feature based and score based likelihood ratios

TL;DR

for Bayes costs, (ii) a detailed two-hypotheses/two-actions illustration, (iii) extension to the general case, (iv) an empirical DNA kinship example, and (v) a critical analysis clarifying misunderstandings in the score-based versus common/specific-source debates, all within an information-theoretic framework.

Abstract

Paper Structure (9 sections, 1 theorem, 30 equations, 5 figures)

This paper contains 9 sections, 1 theorem, 30 equations, 5 figures.

Introduction, context and background
Bayes Decisions improve with more information
Two hypotheses and two actions
The general case
Example: DNA kinship LRs
An analysis of some arguments against score-based methods in a toy example
A toy model
Lack of coherence?
Discussion and conclusions

Key Result

Theorem 2.1

Let $c=(c_{ij})$ be a cost function as described above for mutually exclusive and exhaustive hypotheses $H_1,\dots,H_n$ and actions $A_1,\dots,A_m$. Let $\pi$ be the prior probability distribution on the $H_i$ and let $\pi(E)$ be the (random) posterior probability vector obtained from (a Bayesian up where the expectation is over the evidence we obtain.

Figures (5)

Figure 2.1: $\log_{10}(LR(e_x,e_y))$ (based on DNA profiles) versus $\log_{10}(LR(g(e_x,e_y)))$ (based on number of shared alleles). Each violin plot represents the $LR(e_x,e_y)$ for profiles whose number of shared alleles is displayed in the plot.
Figure 2.2: Cumulative distribution function of $\log_{10}(LR(e_x,e_y))-\log_{10}(LR(g(e_x,e_y)))$ (black), and theoretical bound (dashed).
Figure 2.3: Difference in $\log_{10}(LR(e_x,e_y))$ on 15 versus 10 loci.
Figure 2.4: Difference in $\log_{10}(LR)$ based on profiles, with or without parents of $X$.
Figure 3.1: Comparison of common source likelihood ratios $LR_{CS}$ and specific source likelihood ratios $LR_{SS}$, obtained with the toy model.

Theorems & Definitions (1)

Theorem 2.1

Reconciling common source, specific source, feature based and score based likelihood ratios

TL;DR

Abstract

Reconciling common source, specific source, feature based and score based likelihood ratios

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (1)