Interdisciplinary Fairness in Imbalanced Research Proposal Topic Inference: A Hierarchical Transformer-based Method with Selective Interpolation

Meng Xiao; Min Wu; Ziyue Qiao; Yanjie Fu; Zhiyuan Ning; Yi Du; Yuanchun Zhou

Interdisciplinary Fairness in Imbalanced Research Proposal Topic Inference: A Hierarchical Transformer-based Method with Selective Interpolation

Meng Xiao, Min Wu, Ziyue Qiao, Yanjie Fu, Zhiyuan Ning, Yi Du, Yuanchun Zhou

TL;DR

This work tackles fairness in automatic topic inference for interdisciplinary research proposals within a hierarchical discipline structure. It introduces TIPIN, a Transformer-based architecture that models heterogeneous proposal documents with separate word- and document-level transformers, and employs selective interpolation to generate high-quality pseudo-interdisciplinary samples for balanced training. By adaptively aggregating semantic information along historical prediction paths and using level-wise predictions with a stop mechanism, TIPIN improves both accuracy (F1) and fairness (Disp-Recall) on real-world RP and RP-IR datasets. The results demonstrate substantial gains over baselines and offer practical implications for improving reviewer assignment in grant review processes, especially under interdisciplinary-non-interdisciplinary data imbalance.

Abstract

The objective of topic inference in research proposals aims to obtain the most suitable disciplinary division from the discipline system defined by a funding agency. The agency will subsequently find appropriate peer review experts from their database based on this division. Automated topic inference can reduce human errors caused by manual topic filling, bridge the knowledge gap between funding agencies and project applicants, and improve system efficiency. Existing methods focus on modeling this as a hierarchical multi-label classification problem, using generative models to iteratively infer the most appropriate topic information. However, these methods overlook the gap in scale between interdisciplinary research proposals and non-interdisciplinary ones, leading to an unjust phenomenon where the automated inference system categorizes interdisciplinary proposals as non-interdisciplinary, causing unfairness during the expert assignment. How can we address this data imbalance issue under a complex discipline system and hence resolve this unfairness? In this paper, we implement a topic label inference system based on a Transformer encoder-decoder architecture. Furthermore, we utilize interpolation techniques to create a series of pseudo-interdisciplinary proposals from non-interdisciplinary ones during training based on non-parametric indicators such as cross-topic probabilities and topic occurrence probabilities. This approach aims to reduce the bias of the system during model training. Finally, we conduct extensive experiments on a real-world dataset to verify the effectiveness of the proposed method. The experimental results demonstrate that our training strategy can significantly mitigate the unfairness generated in the topic inference task.

Interdisciplinary Fairness in Imbalanced Research Proposal Topic Inference: A Hierarchical Transformer-based Method with Selective Interpolation

TL;DR

Abstract

Paper Structure (22 sections, 15 equations, 9 figures, 3 tables)

This paper contains 22 sections, 15 equations, 9 figures, 3 tables.

Introduction
Preliminar
Methodology
Modeling Heterogeneous Research Proposal
Selective Interpolation for Enhancing the Fairness.
Topic Inference upon Hierarchical Discipline Structure
Optimize the whole pipeline
Experiment Settings
Experiment Results
Overall Comparison
Discussion on Model Component
Discussion on Interpolation Formulation
Discussion on Level-wise Performance Comparison with Different Interpolation Formulations
Discussion on Error Case
Discussion on Training and Inference Time Cost
...and 7 more sections

Figures (9)

Figure 1: An illustration of the consequence of the interdisciplinary-non-interdisciplinary imbalance issue. The HIRPCN performance on the interdisciplinary test set and the non-interdisciplinary test set. (1) the model will perform nicely on non-interdisciplinary samples, and most test samples are correct, i.e., present as the blue shade. (2) most cases are partially correct on interdisciplinary samples due to the interdisciplinary-non-interdisciplinary imbalance issue, i.e., present as the green shade.
Figure 2: An overview of TIPIN. 1) On the left side, the Selective Interpolation will select two high-quality candidate samples to MixUp. 2) The grey shade is the architecture illustration of TIPIN, which will generate the current step's prediction. 3) TIPIN will follow the iterative progress shown at the bottom of the figure to continue prediction or stop at the current level. 4) The right side is the demonstration of interpolation strategy variants.
Figure 3: The study of model component ablation and interpolation formulation impact.
Figure 4: Level-wise prediction results of each TIPIN variants.
Figure 5: The details of the wrong case occurrence rate on each dataset.
...and 4 more figures

Theorems & Definitions (3)

definition 1: Research Proposal
definition 2: Hierarchical Discipline Structure
definition 3: Topic Inference Problem

Interdisciplinary Fairness in Imbalanced Research Proposal Topic Inference: A Hierarchical Transformer-based Method with Selective Interpolation

TL;DR

Abstract

Interdisciplinary Fairness in Imbalanced Research Proposal Topic Inference: A Hierarchical Transformer-based Method with Selective Interpolation

Authors

TL;DR

Abstract

Table of Contents

Figures (9)

Theorems & Definitions (3)