On Finding Bi-objective Pareto-optimal Fraud Prevention Rule Sets for Fintech Applications

Chengyao Wen; Yin Lou

On Finding Bi-objective Pareto-optimal Fraud Prevention Rule Sets for Fintech Applications

Chengyao Wen, Yin Lou

TL;DR

The paper tackles interpretable fraud-prevention rule mining in Fintech by introducing SpectralRules, a diversity-focused Stage-1 rule generator, and PORS, a Pareto-front expansion framework that serves as an intermediate stage between Stage 1 and Stage 2. It formalizes the bi-objective space of precision and recall, uses the hypervolume ($HV$) as a quality metric, and provides a taxonomy for solution selection on the Pareto front (SSF) with nine sampling methods. Empirical results on public and proprietary Alipay datasets show that SpectralRules yields more diverse, high-quality initial rule pools, and PORS with the $hvc$-ss SSF method consistently achieves the highest HV, correlating with better final $F_\beta$ or recall performance under precision constraints. The work demonstrates practical benefits in real deployments (e.g., Alipay’s Fanglue system) by enabling flexible, efficient rule-set refinement and reducing the need for multiple Stage-2 trials.

Abstract

Rules are widely used in Fintech institutions to make fraud prevention decisions, since rules are highly interpretable thanks to their intuitive if-then structure. In practice, a two-stage framework of fraud prevention decision rule set mining is usually employed in large Fintech institutions; Stage 1 generates a potentially large pool of rules and Stage 2 aims to produce a refined rule subset according to some criteria (typically based on precision and recall). This paper focuses on improving the flexibility and efficacy of this two-stage framework, and is concerned with finding high-quality rule subsets in a bi-objective space (such as precision and recall). To this end, we first introduce a novel algorithm called SpectralRules that directly generates a compact pool of rules in Stage 1 with high diversity. We empirically find such diversity improves the quality of the final rule subset. In addition, we introduce an intermediate stage between Stage 1 and 2 that adopts the concept of Pareto optimality and aims to find a set of non-dominated rule subsets, which constitutes a Pareto front. This intermediate stage greatly simplifies the selection criteria and increases the flexibility of Stage 2. For this intermediate stage, we propose a heuristic-based framework called PORS and we identify that the core of PORS is the problem of solution selection on the front (SSF). We provide a systematic categorization of the SSF problem and a thorough empirical evaluation of various SSF methods on both public and proprietary datasets. On two real application scenarios within Alipay, we demonstrate the advantages of our proposed methodology over existing work.

On Finding Bi-objective Pareto-optimal Fraud Prevention Rule Sets for Fintech Applications

TL;DR

) as a quality metric, and provides a taxonomy for solution selection on the Pareto front (SSF) with nine sampling methods. Empirical results on public and proprietary Alipay datasets show that SpectralRules yields more diverse, high-quality initial rule pools, and PORS with the

-ss SSF method consistently achieves the highest HV, correlating with better final

or recall performance under precision constraints. The work demonstrates practical benefits in real deployments (e.g., Alipay’s Fanglue system) by enabling flexible, efficient rule-set refinement and reducing the need for multiple Stage-2 trials.

Abstract

Paper Structure (23 sections, 4 equations, 4 figures, 8 tables, 3 algorithms)

This paper contains 23 sections, 4 equations, 4 figures, 8 tables, 3 algorithms.

Introduction
Related Work
Rule Set Mining
Evolutionary Multi-objective Optimization
Subset Selection of Pareto-optimal Solutions
Preliminaries
Our Approach
SpectralRules
The PORS Framework
SSF Methods
Uniform Sampling
Non-uniform Sampling
Experiments
Datasets
Pareto-optimal Rule Subsets (Q1 & Q2)
...and 8 more sections

Figures (4)

Figure 1: Illustration of Pareto dominance. A set of 5 non-dominated solutions (green square) constitutes the Pareto front (red line). Hypervolume of those 5 solutions is the size of the light yellow region. The hypervolume contribution of the solution (red square) to those 5 solutions is the size of the light grey region.
Figure 2: HV of PORS algorithm on Bank dataset when using SpectralRules or TreeEns to produce rule set in Stage 1.
Figure 3: Efficiency of NSGA-II vs. PORS on Bank dataset.
Figure 4: Running time for Greedy and PORS on Bank dataset.

Theorems & Definitions (7)

definition 1: Precision
definition 2: Recall
definition 3: $F_\beta$ score
definition 4: Pareto Dominance
definition 5: Pareto Optimality
definition 6: Hypervolume Indicator
definition 7: Hypervolume Contribution

On Finding Bi-objective Pareto-optimal Fraud Prevention Rule Sets for Fintech Applications

TL;DR

Abstract

On Finding Bi-objective Pareto-optimal Fraud Prevention Rule Sets for Fintech Applications

Authors

TL;DR

Abstract

Table of Contents

Figures (4)

Theorems & Definitions (7)