Contributions to the Decision Theoretic Foundations of Machine Learning and Robust Statistics under Weakly Structured Information
Christoph Jansen
TL;DR
This habilitation addresses the challenge of building decision-theoretic foundations for machine learning and robust statistics under weak and non-standard information. It develops a unifying framework around preference systems $\,mathcal{A}=[A,R_1,R_2]$ and generalized stochastic dominance $R_{(\mathcal{A},\mathcal{M})}$ to integrate ordinal and partial cardinal information with credal uncertainty via $\mathcal{M}$. The work comprises ten Contributions across three parts: A (Decision-Theoretic Foundations) introduces elicitation, state-dependent preferences, and multi-target decision rules; B (Machine Learning under Weakly Structured Information) transfers these ideas to ML, enabling mixed-scale benchmarking and robust pseudo-label selection; C (Robust Statistics under Non-Standard Scales of Measurement) develops scale-robust orders and permutation-based tests for poset-valued data. Methodologically, it leverages linear programs to compute GSD-based comparisons, permutation tests with regularization, and depth-based models for posets, enabling information-efficient inference and robust benchmarking under non-standard data and uncertainty. The practical impact lies in providing principled, scalable tools for robust decision-making and reliable ML benchmarking when data do not conform to standard numerical scales or precise probabilistic assumptions.
Abstract
This habilitation thesis is cumulative and, therefore, is collecting and connecting research that I (together with several co-authors) have conducted over the last few years. Thus, the absolute core of the work is formed by the ten publications listed on page 5 under the name Contributions 1 to 10. The references to the complete versions of these articles are also found in this list, making them as easily accessible as possible for readers wishing to dive deep into the different research projects. The chapters following this thesis, namely Parts A to C and the concluding remarks, serve to place the articles in a larger scientific context, to (briefly) explain their respective content on a less formal level, and to highlight some interesting perspectives for future research in their respective contexts. Naturally, therefore, the following presentation has neither the level of detail nor the formal rigor that can (hopefully) be found in the papers. The purpose of the following text is to provide the reader an easy and high-level access to this interesting and important research field as a whole, thereby, advertising it to a broader audience.
