Table of Contents
Fetching ...

Composable Cross-prompt Essay Scoring by Merging Models

Sanwoo Lee, Kun Liang, Yunfang Wu

TL;DR

The paper tackles cross-prompt AES under privacy constraints by proposing a source-free adaptation method that merges pre-trained, per-source models rather than re-training on all data. It represents source updates as LoRA task vectors and optimizes their mixing via a Prior-encoded Information Maximization (PIM) objective, using Beta-derived priors from source score distributions and Bayesian optimization to select coefficients. Empirically, the approach matches or exceeds joint training on all sources and demonstrates strong robustness under cross-dataset shifts, while offering substantial efficiency gains compared to joint adaptation methods. The method advances practical cross-prompt AES by enabling scalable, privacy-preserving adaptation with strong performance and robustness in real-world settings.

Abstract

Recent advances in cross-prompt automated essay scoring (AES) typically train models jointly on all source prompts, often requiring additional access to unlabeled target prompt essays simultaneously. However, using all sources is suboptimal in our pilot study, and re-accessing source datasets during adaptation raises privacy concerns. We propose a source-free adaptation approach that selectively merges individually trained source models' parameters instead of datasets. In particular, we simulate joint training through linear combinations of task vectors -- the parameter updates from fine-tuning. To optimize the combination's coefficients, we propose Prior-encoded Information Maximization (PIM), an unsupervised objective which promotes the model's score discriminability regularized by priors pre-computed from the sources. We employ Bayesian optimization as an efficient optimizer of PIM. Experimental results with LLMs on in-dataset and cross-dataset adaptation show that our method (1) consistently outperforms training jointly on all sources, (2) maintains superior robustness compared to other merging methods, (3) excels under severe distribution shifts where recent leading cross-prompt methods struggle, all while retaining computational efficiency.

Composable Cross-prompt Essay Scoring by Merging Models

TL;DR

The paper tackles cross-prompt AES under privacy constraints by proposing a source-free adaptation method that merges pre-trained, per-source models rather than re-training on all data. It represents source updates as LoRA task vectors and optimizes their mixing via a Prior-encoded Information Maximization (PIM) objective, using Beta-derived priors from source score distributions and Bayesian optimization to select coefficients. Empirically, the approach matches or exceeds joint training on all sources and demonstrates strong robustness under cross-dataset shifts, while offering substantial efficiency gains compared to joint adaptation methods. The method advances practical cross-prompt AES by enabling scalable, privacy-preserving adaptation with strong performance and robustness in real-world settings.

Abstract

Recent advances in cross-prompt automated essay scoring (AES) typically train models jointly on all source prompts, often requiring additional access to unlabeled target prompt essays simultaneously. However, using all sources is suboptimal in our pilot study, and re-accessing source datasets during adaptation raises privacy concerns. We propose a source-free adaptation approach that selectively merges individually trained source models' parameters instead of datasets. In particular, we simulate joint training through linear combinations of task vectors -- the parameter updates from fine-tuning. To optimize the combination's coefficients, we propose Prior-encoded Information Maximization (PIM), an unsupervised objective which promotes the model's score discriminability regularized by priors pre-computed from the sources. We employ Bayesian optimization as an efficient optimizer of PIM. Experimental results with LLMs on in-dataset and cross-dataset adaptation show that our method (1) consistently outperforms training jointly on all sources, (2) maintains superior robustness compared to other merging methods, (3) excels under severe distribution shifts where recent leading cross-prompt methods struggle, all while retaining computational efficiency.

Paper Structure

This paper contains 38 sections, 18 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Agreement with human raters (QWK) on target prompt (P7 of ASAP), using BERT devlin-etal-2019-bert fine-tuned jointly on varying number of source prompt datasets. Notably, training on all sources leads to suboptimal performance. Similar trends are observed on other target prompts, see Appendix \ref{['apdx:pilot_study_full']}.
  • Figure 2: An illustration of our method for source-free cross-prompt AES. Left: Source models and statistics are pre-trained before adaptation. Right: During source-free adaptation, merging coefficients are optimized via Bayesian optimization to enhance our prior-encoded information maximization (PIM) criterion (Eq. \ref{['eq:objective']}).
  • Figure 3: Comparison of PIM (phi-4-mini-it) with top-performing cross-prompt methods (PAES and PMAES). Similar trends for llama-3.1-8b-it (Appendix \ref{['apdx:comparison_with_sota']}).
  • Figure 4: Log-scale time (y-axis) for pre-training source models on ASAP (top), followed by adaptation and inference on PERSUADE2.0 (middle), along with the time ratios between adaptation and inference (bottom). Results are from a single run on an NVIDIA A40 GPU.
  • Figure 5: Agreement with human raters (QWK) on target domains using BERT devlin-etal-2019-bert trained jointly on varying number of source prompt datasets.
  • ...and 1 more figures