Table of Contents
Fetching ...

ODC and ROC curves, comparison curves, and stochastic dominance

Teresa Ledwina, Adam Zagdański

Abstract

We discuss two novel approaches to the classical two-sample problem. Our starting point are properly standardized and combined, very popular in several areas of statistics and data analysis, ordinal dominance and receiver characteristic curves, denoted by ODC and ROC, respectively. The proposed new curves are termed the comparison curves. Their estimates, being weighted rank processes on (0,1), form the basis of inference. These weighted processes are intuitive, well-suited for visual inspection of data at hand, and are also useful for constructing some formal inferential procedures. They can be applied to several variants of two-sample problem. Their use can help to improve some existing procedures both in terms of power and the ability to identify the sources of departures from the postulated model. To simplify interpretation of finite sample results we restrict attention to values of the processes on a finite grid of points. This results in the so-called bar plots (B-plots) which readably summarize the information contained in the data. What is more, we show that B-plots along with adjusted simultaneous acceptance regions provide principled information about where the model departs from the data. This leads to a framework which facilitates identification of regions with locally significant differences. We show an implementation of the considered techniques to a standard stochastic dominance testing problem. Some min-type statistics are introduced and investigated. A simulation study compares two tests pertinent to the comparison curves to well-established tests in the literature and demonstrates the strong and competitive performance of the former in many typical situations. Some real data applications illustrate simplicity and practical usefulness of the proposed approaches. A range of other applications of considered weighted processes is briefly discussed too.

ODC and ROC curves, comparison curves, and stochastic dominance

Abstract

We discuss two novel approaches to the classical two-sample problem. Our starting point are properly standardized and combined, very popular in several areas of statistics and data analysis, ordinal dominance and receiver characteristic curves, denoted by ODC and ROC, respectively. The proposed new curves are termed the comparison curves. Their estimates, being weighted rank processes on (0,1), form the basis of inference. These weighted processes are intuitive, well-suited for visual inspection of data at hand, and are also useful for constructing some formal inferential procedures. They can be applied to several variants of two-sample problem. Their use can help to improve some existing procedures both in terms of power and the ability to identify the sources of departures from the postulated model. To simplify interpretation of finite sample results we restrict attention to values of the processes on a finite grid of points. This results in the so-called bar plots (B-plots) which readably summarize the information contained in the data. What is more, we show that B-plots along with adjusted simultaneous acceptance regions provide principled information about where the model departs from the data. This leads to a framework which facilitates identification of regions with locally significant differences. We show an implementation of the considered techniques to a standard stochastic dominance testing problem. Some min-type statistics are introduced and investigated. A simulation study compares two tests pertinent to the comparison curves to well-established tests in the literature and demonstrates the strong and competitive performance of the former in many typical situations. Some real data applications illustrate simplicity and practical usefulness of the proposed approaches. A range of other applications of considered weighted processes is briefly discussed too.
Paper Structure (49 equations, 5 figures, 7 tables)

This paper contains 49 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Graphical representation of alternatives ${\mathbb A}_1 - {\mathbb A}_9$. MC estimated CC -- red dots, and MC estimated CCC -- blue squares, over the grid of 127 points.
  • Figure 2: Results obtained for Canadian after-tax family income in 1978 versus 1986 comparison with $m=9\;470,\;n=8\;526$ and $D(N)=127$. The figure shows: rescaled difference in empirical CDF's (top-left plot), realizations of empirical processes ${\sf U}_N(p)$ and ${\sf P}_N(p)$ over the 127 points grid (top-right plot), B-plots along with 95% one-sided simultaneous acceptance regions pertaining to ${\sf U}_N(p)$ and ${\sf P}_N(p)$ (middle row), and corresponding box plots (bottom row).
  • Figure 2: Results obtained for comparison of cholesterol levels in obese men groups in Puerto Rico versus Honolulu with $m=160, n=628$ and $D(N)=127$. The figure shows: rescaled difference in empirical CDF's (top-left plot), realizations of empirical processes ${\sf U}_N(p)$ and ${\sf P}_N(p)$ over the 127 points grid (top-right plot), B-plots along with 95% one-sided simultaneous acceptance regions pertaining to ${\sf U}_N(p)$ and ${\sf P}_N(p)$ (middle row), and corresponding box plots (bottom row).
  • Figure C1: Results for Canadian after-tax family income in 1978 versus 1986 comparison with $m = 9\; 470, n = 8\; 526$ and $D(N) = 16\; 383$. The figure shows box plots obtained for distributions of the barriers $L({\sf U}_N,I_k)$ and $L({\sf P}_N,I_k),\; k=1,...,10$, under $F = G$.
  • Figure C2: Results for comparison of Puerto Rico versus Honolulu cholesterol levels in obese men groups with $m=160, n=628$ and $D(N)=511$. The figure shows box plots obtained for distributions of the barriers $L({\sf U}_N,I_k)$ and $L({\sf P}_N,I_k),\; k=1,...,10$, under $F = G$.