Table of Contents
Fetching ...

Federated Item Response Models: A Gradient-driven Privacy-preserving Framework for Distributed Psychometric Estimation

Biying Zhou, Nanyu Luo, Feng Ji

Abstract

Item Response Theory (IRT) models are widely used to estimate respondents' latent abilities and calibrate item difficulty. Traditional IRT estimation typically requires centralizing all raw responses, raising privacy and governance concerns. We introduce Federated Item Response Theory (FedIRT), a framework that enables distributed calibration of standard IRT models without transferring individual-level data, thereby preserving confidentiality while retaining statistical efficiency. To provide formal protection, we further develop FedIRT-DP, a user-level differentially private extension. Each site computes per-student gradients, clips them to a fixed norm, and shares only masked sums; the server adds calibrated Gaussian noise and performs MAP updates. This yields an auditable $(\varepsilon,δ)$ guarantee at the student level and a single, tunable privacy-utility trade-off via the clipping bound and noise scale. The same mechanism improves robustness to extreme response rows (e.g., all-zeros/ones). Across simulations, FedIRT matches the accuracy of centralized estimators from popular $\texttt{R}$ packages while avoiding data pooling; FedIRT-DP achieves comparable accuracy under stronger privacy and exhibits superior robustness to contamination. An empirical study on a real exam dataset demonstrates practical viability and consistent item and site-effect estimates. To facilitate adoption, we release an open-source $\texttt{R}$ package, $\texttt{FedIRT}$, implementing the two-parameter logistic (2PL) and partial credit models (PCM) with federated and differentially private training.

Federated Item Response Models: A Gradient-driven Privacy-preserving Framework for Distributed Psychometric Estimation

Abstract

Item Response Theory (IRT) models are widely used to estimate respondents' latent abilities and calibrate item difficulty. Traditional IRT estimation typically requires centralizing all raw responses, raising privacy and governance concerns. We introduce Federated Item Response Theory (FedIRT), a framework that enables distributed calibration of standard IRT models without transferring individual-level data, thereby preserving confidentiality while retaining statistical efficiency. To provide formal protection, we further develop FedIRT-DP, a user-level differentially private extension. Each site computes per-student gradients, clips them to a fixed norm, and shares only masked sums; the server adds calibrated Gaussian noise and performs MAP updates. This yields an auditable guarantee at the student level and a single, tunable privacy-utility trade-off via the clipping bound and noise scale. The same mechanism improves robustness to extreme response rows (e.g., all-zeros/ones). Across simulations, FedIRT matches the accuracy of centralized estimators from popular packages while avoiding data pooling; FedIRT-DP achieves comparable accuracy under stronger privacy and exhibits superior robustness to contamination. An empirical study on a real exam dataset demonstrates practical viability and consistent item and site-effect estimates. To facilitate adoption, we release an open-source package, , implementing the two-parameter logistic (2PL) and partial credit models (PCM) with federated and differentially private training.

Paper Structure

This paper contains 29 sections, 25 equations, 8 figures, 1 table, 4 algorithms.

Figures (8)

  • Figure 1: Flow chart for the estimation of item parameters $\bm{\alpha}$ and $\bm{\beta}$ and school effect $\bm{s}$ in the 2PL model via the FedIRT and FedIRT-DP framework.
  • Figure 2: Comparison of MSE for FedIRT, FedIRT-DP centralized estimation by ltm and mirt across different sample sizes and levels of item discrimination and difficulty.
  • Figure 3: Comparison of MSE for FedIRT, FedIRT-DP and meta-analysis approach by mirt across different sample sizes and different levels of item discrimination and difficulty.
  • Figure 4: Comparison of MSE and bias for item parameter estimation between fixed and random school effect approaches.
  • Figure 5: Comparison of MSE and bias for school effect estimation between fixed and random school effect approaches.
  • ...and 3 more figures