PAGER: A Framework for Failure Analysis of Deep Regression Models
Jayaraman J. Thiagarajan, Vivek Narayanaswamy, Puja Trivedi, Rushil Anirudh
TL;DR
PAGER tackles the problem of failure detection in deep regression by challenging the sufficiency of epistemic uncertainty alone for risk characterization. The framework blends forward anchoring (uncertainty) with reverse anchoring (manifold non-conformity) to yield Score_1 and Score_2, organizing test samples into risk regimes (ID, Low Risk, Moderate Risk, High Risk). Across 1D benchmarks, high-dimensional regression, and image regression, PAGER consistently outperforms baselines on false negatives, false positives, and regime-confusion metrics, even under distribution shifts. The approach enhances safety for regression deployments by providing a practical, calibration-free method to detect and categorize failures, with potential impact across healthcare, physical sciences, and robotics.
Abstract
Safe deployment of AI models requires proactive detection of failures to prevent costly errors. To this end, we study the important problem of detecting failures in deep regression models. Existing approaches rely on epistemic uncertainty estimates or inconsistency w.r.t the training data to identify failure. Interestingly, we find that while uncertainties are necessary they are insufficient to accurately characterize failure in practice. Hence, we introduce PAGER (Principled Analysis of Generalization Errors in Regressors), a framework to systematically detect and characterize failures in deep regressors. Built upon the principle of anchored training in deep models, PAGER unifies both epistemic uncertainty and complementary manifold non-conformity scores to accurately organize samples into different risk regimes.
