Model Selection over Partially Ordered Sets
Armeen Taeb, Peter Bühlmann, Venkat Chandrasekaran
TL;DR
The paper presents a general poset-based framework for model selection that extends beyond Boolean structures by endowing model collections with a least element and a rank that captures complexity. It defines true discoveries via a symmetric similarity valuation $\rho$, enabling generalized false discovery metrics $\mathrm{TD}$, $\mathrm{FD}$, and $\mathrm{FDP}$, and introduces two generic FD-control procedures: a stability-based approach and a testing-based approach. The methods apply across diverse domains, including variable selection, clustering, ranking, causal structure learning, changepoint estimation, and blind source separation, with theoretical guarantees on FD control and practical algorithms. Empirical results on synthetic and real data demonstrate controlled false discoveries along with meaningful discoveries, and the authors provide open-source code for implementation.
Abstract
In problems such as variable selection and graph estimation, models are characterized by Boolean logical structure such as presence or absence of a variable or an edge. Consequently, false positive error or false negative error can be specified as the number of variables/edges that are incorrectly included or excluded in an estimated model. However, there are several other problems such as ranking, clustering, and causal inference in which the associated model classes do not admit transparent notions of false positive and false negative errors due to the lack of an underlying Boolean logical structure. In this paper, we present a generic approach to endow a collection of models with partial order structure, which leads to a hierarchical organization of model classes as well as natural analogs of false positive and false negative errors. We describe model selection procedures that provide false positive error control in our general setting and we illustrate their utility with numerical experiments.
