Table of Contents
Fetching ...

On Finding Bi-objective Pareto-optimal Fraud Prevention Rule Sets for Fintech Applications

Chengyao Wen, Yin Lou

TL;DR

The paper tackles interpretable fraud-prevention rule mining in Fintech by introducing SpectralRules, a diversity-focused Stage-1 rule generator, and PORS, a Pareto-front expansion framework that serves as an intermediate stage between Stage 1 and Stage 2. It formalizes the bi-objective space of precision and recall, uses the hypervolume ($HV$) as a quality metric, and provides a taxonomy for solution selection on the Pareto front (SSF) with nine sampling methods. Empirical results on public and proprietary Alipay datasets show that SpectralRules yields more diverse, high-quality initial rule pools, and PORS with the $hvc$-ss SSF method consistently achieves the highest HV, correlating with better final $F_\beta$ or recall performance under precision constraints. The work demonstrates practical benefits in real deployments (e.g., Alipay’s Fanglue system) by enabling flexible, efficient rule-set refinement and reducing the need for multiple Stage-2 trials.

Abstract

Rules are widely used in Fintech institutions to make fraud prevention decisions, since rules are highly interpretable thanks to their intuitive if-then structure. In practice, a two-stage framework of fraud prevention decision rule set mining is usually employed in large Fintech institutions; Stage 1 generates a potentially large pool of rules and Stage 2 aims to produce a refined rule subset according to some criteria (typically based on precision and recall). This paper focuses on improving the flexibility and efficacy of this two-stage framework, and is concerned with finding high-quality rule subsets in a bi-objective space (such as precision and recall). To this end, we first introduce a novel algorithm called SpectralRules that directly generates a compact pool of rules in Stage 1 with high diversity. We empirically find such diversity improves the quality of the final rule subset. In addition, we introduce an intermediate stage between Stage 1 and 2 that adopts the concept of Pareto optimality and aims to find a set of non-dominated rule subsets, which constitutes a Pareto front. This intermediate stage greatly simplifies the selection criteria and increases the flexibility of Stage 2. For this intermediate stage, we propose a heuristic-based framework called PORS and we identify that the core of PORS is the problem of solution selection on the front (SSF). We provide a systematic categorization of the SSF problem and a thorough empirical evaluation of various SSF methods on both public and proprietary datasets. On two real application scenarios within Alipay, we demonstrate the advantages of our proposed methodology over existing work.

On Finding Bi-objective Pareto-optimal Fraud Prevention Rule Sets for Fintech Applications

TL;DR

The paper tackles interpretable fraud-prevention rule mining in Fintech by introducing SpectralRules, a diversity-focused Stage-1 rule generator, and PORS, a Pareto-front expansion framework that serves as an intermediate stage between Stage 1 and Stage 2. It formalizes the bi-objective space of precision and recall, uses the hypervolume () as a quality metric, and provides a taxonomy for solution selection on the Pareto front (SSF) with nine sampling methods. Empirical results on public and proprietary Alipay datasets show that SpectralRules yields more diverse, high-quality initial rule pools, and PORS with the -ss SSF method consistently achieves the highest HV, correlating with better final or recall performance under precision constraints. The work demonstrates practical benefits in real deployments (e.g., Alipay’s Fanglue system) by enabling flexible, efficient rule-set refinement and reducing the need for multiple Stage-2 trials.

Abstract

Rules are widely used in Fintech institutions to make fraud prevention decisions, since rules are highly interpretable thanks to their intuitive if-then structure. In practice, a two-stage framework of fraud prevention decision rule set mining is usually employed in large Fintech institutions; Stage 1 generates a potentially large pool of rules and Stage 2 aims to produce a refined rule subset according to some criteria (typically based on precision and recall). This paper focuses on improving the flexibility and efficacy of this two-stage framework, and is concerned with finding high-quality rule subsets in a bi-objective space (such as precision and recall). To this end, we first introduce a novel algorithm called SpectralRules that directly generates a compact pool of rules in Stage 1 with high diversity. We empirically find such diversity improves the quality of the final rule subset. In addition, we introduce an intermediate stage between Stage 1 and 2 that adopts the concept of Pareto optimality and aims to find a set of non-dominated rule subsets, which constitutes a Pareto front. This intermediate stage greatly simplifies the selection criteria and increases the flexibility of Stage 2. For this intermediate stage, we propose a heuristic-based framework called PORS and we identify that the core of PORS is the problem of solution selection on the front (SSF). We provide a systematic categorization of the SSF problem and a thorough empirical evaluation of various SSF methods on both public and proprietary datasets. On two real application scenarios within Alipay, we demonstrate the advantages of our proposed methodology over existing work.
Paper Structure (23 sections, 4 equations, 4 figures, 8 tables, 3 algorithms)

This paper contains 23 sections, 4 equations, 4 figures, 8 tables, 3 algorithms.

Figures (4)

  • Figure 1: Illustration of Pareto dominance. A set of 5 non-dominated solutions (green square) constitutes the Pareto front (red line). Hypervolume of those 5 solutions is the size of the light yellow region. The hypervolume contribution of the solution (red square) to those 5 solutions is the size of the light grey region.
  • Figure 2: HV of PORS algorithm on Bank dataset when using SpectralRules or TreeEns to produce rule set in Stage 1.
  • Figure 3: Efficiency of NSGA-II vs. PORS on Bank dataset.
  • Figure 4: Running time for Greedy and PORS on Bank dataset.

Theorems & Definitions (7)

  • definition 1: Precision
  • definition 2: Recall
  • definition 3: $F_\beta$ score
  • definition 4: Pareto Dominance
  • definition 5: Pareto Optimality
  • definition 6: Hypervolume Indicator
  • definition 7: Hypervolume Contribution