Table of Contents
Fetching ...

Foundations of Unknown-aware Machine Learning

Xuefeng Du

TL;DR

This thesis tackles reliability under distributional uncertainty and unknown classes by formulating unknown-aware learning. It introduces three core strands: (i) VOS and Dream-ood for tractable, interpretable synthetic outliers that regularize decision boundaries and improve OOD detection, (ii) SIREN for shaping object-level representations into compact, class-specific von Mises-Fisher distributions with effective test-time OOD scores, and (iii) SAL and HaloScope for leveraging unlabeled wild data and unlabeled LLM generations to improve OOD detection and hallucination safety. Theoretical analysis provides separability and learnability guarantees for the unlabeled-data approaches, while extensive empirical evaluations demonstrate state-of-the-art performance in OOD detection for object detection and image classification, robust hallucination detection across LLMs, and scalable reliability improvements for foundation models. Together, these results establish unknown-aware learning as a practical paradigm that improves AI safety and reliability with minimal human supervision, spanning vision, language, and multimodal models. The work demonstrates formal reliability guarantees, interpretable outlier generation, and scalable strategies for deploying trustworthy foundation models in the wild.

Abstract

Ensuring the reliability and safety of machine learning models in open-world deployment is a central challenge in AI safety. This thesis develops both algorithmic and theoretical foundations to address key reliability issues arising from distributional uncertainty and unknown classes, from standard neural networks to modern foundation models like large language models (LLMs). Traditional learning paradigms, such as empirical risk minimization (ERM), assume no distribution shift between training and inference, often leading to overconfident predictions on out-of-distribution (OOD) inputs. This thesis introduces novel frameworks that jointly optimize for in-distribution accuracy and reliability to unseen data. A core contribution is the development of an unknown-aware learning framework that enables models to recognize and handle novel inputs without labeled OOD data. We propose new outlier synthesis methods, VOS, NPOS, and DREAM-OOD, to generate informative unknowns during training. Building on this, we present SAL, a theoretical and algorithmic framework that leverages unlabeled in-the-wild data to enhance OOD detection under realistic deployment conditions. These methods demonstrate that abundant unlabeled data can be harnessed to recognize and adapt to unforeseen inputs, providing formal reliability guarantees. The thesis also extends reliable learning to foundation models. We develop HaloScope for hallucination detection in LLMs, MLLMGuard for defending against malicious prompts in multimodal models, and data cleaning methods to denoise human feedback used for better alignment. These tools target failure modes that threaten the safety of large-scale models in deployment. Overall, these contributions promote unknown-aware learning as a new paradigm, and we hope it can advance the reliability of AI systems with minimal human efforts.

Foundations of Unknown-aware Machine Learning

TL;DR

This thesis tackles reliability under distributional uncertainty and unknown classes by formulating unknown-aware learning. It introduces three core strands: (i) VOS and Dream-ood for tractable, interpretable synthetic outliers that regularize decision boundaries and improve OOD detection, (ii) SIREN for shaping object-level representations into compact, class-specific von Mises-Fisher distributions with effective test-time OOD scores, and (iii) SAL and HaloScope for leveraging unlabeled wild data and unlabeled LLM generations to improve OOD detection and hallucination safety. Theoretical analysis provides separability and learnability guarantees for the unlabeled-data approaches, while extensive empirical evaluations demonstrate state-of-the-art performance in OOD detection for object detection and image classification, robust hallucination detection across LLMs, and scalable reliability improvements for foundation models. Together, these results establish unknown-aware learning as a practical paradigm that improves AI safety and reliability with minimal human supervision, spanning vision, language, and multimodal models. The work demonstrates formal reliability guarantees, interpretable outlier generation, and scalable strategies for deploying trustworthy foundation models in the wild.

Abstract

Ensuring the reliability and safety of machine learning models in open-world deployment is a central challenge in AI safety. This thesis develops both algorithmic and theoretical foundations to address key reliability issues arising from distributional uncertainty and unknown classes, from standard neural networks to modern foundation models like large language models (LLMs). Traditional learning paradigms, such as empirical risk minimization (ERM), assume no distribution shift between training and inference, often leading to overconfident predictions on out-of-distribution (OOD) inputs. This thesis introduces novel frameworks that jointly optimize for in-distribution accuracy and reliability to unseen data. A core contribution is the development of an unknown-aware learning framework that enables models to recognize and handle novel inputs without labeled OOD data. We propose new outlier synthesis methods, VOS, NPOS, and DREAM-OOD, to generate informative unknowns during training. Building on this, we present SAL, a theoretical and algorithmic framework that leverages unlabeled in-the-wild data to enhance OOD detection under realistic deployment conditions. These methods demonstrate that abundant unlabeled data can be harnessed to recognize and adapt to unforeseen inputs, providing formal reliability guarantees. The thesis also extends reliable learning to foundation models. We develop HaloScope for hallucination detection in LLMs, MLLMGuard for defending against malicious prompts in multimodal models, and data cleaning methods to denoise human feedback used for better alignment. These tools target failure modes that threaten the safety of large-scale models in deployment. Overall, these contributions promote unknown-aware learning as a new paradigm, and we hope it can advance the reliability of AI systems with minimal human efforts.

Paper Structure

This paper contains 181 sections, 25 theorems, 196 equations, 38 figures, 55 tables, 3 algorithms.

Key Result

Theorem 8.1

(Informal). Under mild conditions, if $\ell(\mathbf{h}_{\mathbf{w}}(\mathbf{x}),y)$ is $\beta_1$-smooth w.r.t. $\mathbf{w}$, $\mathbb{P}_{\text{wild}}$ has $(\gamma,\zeta)$-discrepancy w.r.t. $\mathbb{P}_{\mathcal{X}\mathcal{Y}}$ (c.f. Appendices sec:definition_app, sec:assumption_app), and there is where $R^{*}_{{\text{in}}}$ is the optimal ID risk, i.e., $R^{*}_{{\text{in}}}=\min_{\mathbf{w}\in

Figures (38)

  • Figure 1: (a) An object detection model trained on BDD-100k dataset DBLP:conf/cvpr/YuCWXCLMD20 produces overconfident predictions for OOD objects (e.g., helicopter), highlighting reliability concerns in ML models during deployment. Test images are sampled from MS-COCO lin2014microsoft. (b) Overview of my proposed outlier synthesis framework for unknown-aware learning.
  • Figure 2: (a) A Faster-RCNN ren2015faster model trained on BDD-100k dataset DBLP:conf/cvpr/YuCWXCLMD20 produces overconfident predictions for OOD object (e.g., moose). (b)-(c) The uncertainty measurement with and without virtual outlier training. The in-distribution data $\mathbf{x}\in \mathcal{X}=\mathbb{R}^2$ is sampled from a Gaussian mixture model). Regularizing the model with virtual outliers (c) better captures the OOD uncertainty than without (b).
  • Figure 3: The framework of VOS. We model the feature representation of ID objects as class-conditional Gaussians, and sample virtual outliers $\mathbf{v}$ from the low-likelihood region. The virtual outliers, along with the ID objects, are used to produce the uncertainty loss for regularization. The uncertainty estimation branch ($\mathcal{L}_{\mathrm{uncertainty}}$) is jointly trained with the object detection loss ($\mathcal{L}_{\mathrm{loc}},\mathcal{L}_{\mathrm{cls}}$).
  • Figure 4: UMAP visualization of feature embeddings of PASCAL-VOC (on a subset of 10 classes).
  • Figure 5: Visualization of detected objects on the OOD images (from MS-COCO) by a vanilla Faster-RCNN (top) and VOS (bottom). The in-distribution is BDD-100k dataset. Blue: Objects detected and classified as one of the ID classes. Green: OOD objects detected by VOS, which reduce false positives among detected objects.
  • ...and 33 more figures

Theorems & Definitions (53)

  • Remark 1
  • Theorem 8.1
  • Theorem 8.2
  • Theorem 8.3
  • Definition 10.1: LLM generation
  • Definition 10.2: Hallucination detection
  • Definition 10.3: Unlabeled data distribution
  • Definition 10.4: Empirical dataset
  • Definition 1: $\beta$-smooth
  • Definition 2: Gradient-based Distribution Discrepancy
  • ...and 43 more