Composable Cross-prompt Essay Scoring by Merging Models
Sanwoo Lee, Kun Liang, Yunfang Wu
TL;DR
The paper tackles cross-prompt AES under privacy constraints by proposing a source-free adaptation method that merges pre-trained, per-source models rather than re-training on all data. It represents source updates as LoRA task vectors and optimizes their mixing via a Prior-encoded Information Maximization (PIM) objective, using Beta-derived priors from source score distributions and Bayesian optimization to select coefficients. Empirically, the approach matches or exceeds joint training on all sources and demonstrates strong robustness under cross-dataset shifts, while offering substantial efficiency gains compared to joint adaptation methods. The method advances practical cross-prompt AES by enabling scalable, privacy-preserving adaptation with strong performance and robustness in real-world settings.
Abstract
Recent advances in cross-prompt automated essay scoring (AES) typically train models jointly on all source prompts, often requiring additional access to unlabeled target prompt essays simultaneously. However, using all sources is suboptimal in our pilot study, and re-accessing source datasets during adaptation raises privacy concerns. We propose a source-free adaptation approach that selectively merges individually trained source models' parameters instead of datasets. In particular, we simulate joint training through linear combinations of task vectors -- the parameter updates from fine-tuning. To optimize the combination's coefficients, we propose Prior-encoded Information Maximization (PIM), an unsupervised objective which promotes the model's score discriminability regularized by priors pre-computed from the sources. We employ Bayesian optimization as an efficient optimizer of PIM. Experimental results with LLMs on in-dataset and cross-dataset adaptation show that our method (1) consistently outperforms training jointly on all sources, (2) maintains superior robustness compared to other merging methods, (3) excels under severe distribution shifts where recent leading cross-prompt methods struggle, all while retaining computational efficiency.
