Two-stage Incomplete Utterance Rewriting on Editing Operation
Zhiyu Cao, Peifeng Li, Qiaoming Zhu, Yaxin Fan
TL;DR
This work tackles incomplete utterance rewriting (IUR) by explicitly modeling editing operations—insertion for ellipsis and replacement for coreference—within a two-stage framework called TEO. The first stage predicts editing operations $oldsymbol{ ext{E}}$ from $ ext{Hist}$ and $U_n$, and the second stage rewrites $U_n$ using $ ext{Hist}$, $U_n$, and $oldsymbol{ ext{E}}$, with the objective $P(Y|Hist,U_n) = P(oldsymbol{ ext{E}}|Hist,U_n) imes P(Y|Hist,U_n,oldsymbol{ ext{E}})$. A novel adversarial perturbation strategy is employed to mitigate exposure bias and cascading errors, improving robustness when editing operations are imperfect. Across English TASK and Chinese REWRITE/RES200K datasets, TEO achieves state-of-the-art performance, demonstrating that an editing-pivot approach yields finer-grained control over coreference and ellipsis resolution with practical impact on downstream dialogue understanding.
Abstract
Previous work on Incomplete Utterance Rewriting (IUR) has primarily focused on generating rewritten utterances based solely on dialogue context, ignoring the widespread phenomenon of coreference and ellipsis in dialogues. To address this issue, we propose a novel framework called TEO (\emph{Two-stage approach on Editing Operation}) for IUR, in which the first stage generates editing operations and the second stage rewrites incomplete utterances utilizing the generated editing operations and the dialogue context. Furthermore, an adversarial perturbation strategy is proposed to mitigate cascading errors and exposure bias caused by the inconsistency between training and inference in the second stage. Experimental results on three IUR datasets show that our TEO outperforms the SOTA models significantly.
