OTLP: Output Thresholding Using Mixed Integer Linear Programming
Baran Koseoglu, Luca Traverso, Mohammed Topiwalla, Egor Kraev, Zoltan Szopory
TL;DR
The paper addresses the problem of selecting optimal inference-time thresholds for probability-based classifiers, particularly in imbalanced settings. It introduces OTLP, a model-agnostic framework that uses mixed-integer linear programming to search for a single threshold (or threshold per subspace) by optimizing a user-defined objective under customizable constraints, based on validation data. Through extensive experiments on a Credit Card Fraud Detection dataset across multiple settings, OTLP demonstrates thresholds that outperform the default threshold and adapt to complex problem structures, including global/local and subspace constraints. The work provides a principled, constraint-aware thresholding approach that can be integrated into standard ML Pipelines, with potential for scalable extensions in future work.
Abstract
Output thresholding is the technique to search for the best threshold to be used during inference for any classifiers that can produce probability estimates on train and testing datasets. It is particularly useful in high imbalance classification problems where the default threshold is not able to refer to imbalance in class distributions and fail to give the best performance. This paper proposes OTLP, a thresholding framework using mixed integer linear programming which is model agnostic, can support different objective functions and different set of constraints for a diverse set of problems including both balanced and imbalanced classification problems. It is particularly useful in real world applications where the theoretical thresholding techniques are not able to address to product related requirements and complexity of the applications which utilize machine learning models. Through the use of Credit Card Fraud Detection Dataset, we evaluate the usefulness of the framework.
