On Finding Bi-objective Pareto-optimal Fraud Prevention Rule Sets for Fintech Applications
Chengyao Wen, Yin Lou
TL;DR
The paper tackles interpretable fraud-prevention rule mining in Fintech by introducing SpectralRules, a diversity-focused Stage-1 rule generator, and PORS, a Pareto-front expansion framework that serves as an intermediate stage between Stage 1 and Stage 2. It formalizes the bi-objective space of precision and recall, uses the hypervolume ($HV$) as a quality metric, and provides a taxonomy for solution selection on the Pareto front (SSF) with nine sampling methods. Empirical results on public and proprietary Alipay datasets show that SpectralRules yields more diverse, high-quality initial rule pools, and PORS with the $hvc$-ss SSF method consistently achieves the highest HV, correlating with better final $F_\beta$ or recall performance under precision constraints. The work demonstrates practical benefits in real deployments (e.g., Alipay’s Fanglue system) by enabling flexible, efficient rule-set refinement and reducing the need for multiple Stage-2 trials.
Abstract
Rules are widely used in Fintech institutions to make fraud prevention decisions, since rules are highly interpretable thanks to their intuitive if-then structure. In practice, a two-stage framework of fraud prevention decision rule set mining is usually employed in large Fintech institutions; Stage 1 generates a potentially large pool of rules and Stage 2 aims to produce a refined rule subset according to some criteria (typically based on precision and recall). This paper focuses on improving the flexibility and efficacy of this two-stage framework, and is concerned with finding high-quality rule subsets in a bi-objective space (such as precision and recall). To this end, we first introduce a novel algorithm called SpectralRules that directly generates a compact pool of rules in Stage 1 with high diversity. We empirically find such diversity improves the quality of the final rule subset. In addition, we introduce an intermediate stage between Stage 1 and 2 that adopts the concept of Pareto optimality and aims to find a set of non-dominated rule subsets, which constitutes a Pareto front. This intermediate stage greatly simplifies the selection criteria and increases the flexibility of Stage 2. For this intermediate stage, we propose a heuristic-based framework called PORS and we identify that the core of PORS is the problem of solution selection on the front (SSF). We provide a systematic categorization of the SSF problem and a thorough empirical evaluation of various SSF methods on both public and proprietary datasets. On two real application scenarios within Alipay, we demonstrate the advantages of our proposed methodology over existing work.
