Table of Contents
Fetching ...

Bandit Structured Prediction for Learning from Partial Feedback in Statistical Machine Translation

Artem Sokolov, Stefan Riezler, Tanguy Urvoy

TL;DR

The paper tackles learning structured predictions when only bandit, single-point feedback is available, focusing on discriminative reranking in statistical machine translation. It introduces Bandit Structured Prediction, which uses Gibbs-sampled outputs to form an unbiased gradient estimate and updates weights accordingly, paired with a convergence analysis in the pseudogradient framework. The authors provide theoretical guarantees and validate the approach on SMT domain adaptation tasks, showing improvements over out-of-domain baselines and competitive performance relative to structured dueling bandits that use two-point feedback. The work demonstrates the practicality of learning from partial feedback in interactive NLP settings where full references or multiple feedback signals are impractical.

Abstract

We present an approach to structured prediction from bandit feedback, called Bandit Structured Prediction, where only the value of a task loss function at a single predicted point, instead of a correct structure, is observed in learning. We present an application to discriminative reranking in Statistical Machine Translation (SMT) where the learning algorithm only has access to a 1-BLEU loss evaluation of a predicted translation instead of obtaining a gold standard reference translation. In our experiment bandit feedback is obtained by evaluating BLEU on reference translations without revealing them to the algorithm. This can be thought of as a simulation of interactive machine translation where an SMT system is personalized by a user who provides single point feedback to predicted translations. Our experiments show that our approach improves translation quality and is comparable to approaches that employ more informative feedback in learning.

Bandit Structured Prediction for Learning from Partial Feedback in Statistical Machine Translation

TL;DR

The paper tackles learning structured predictions when only bandit, single-point feedback is available, focusing on discriminative reranking in statistical machine translation. It introduces Bandit Structured Prediction, which uses Gibbs-sampled outputs to form an unbiased gradient estimate and updates weights accordingly, paired with a convergence analysis in the pseudogradient framework. The authors provide theoretical guarantees and validate the approach on SMT domain adaptation tasks, showing improvements over out-of-domain baselines and competitive performance relative to structured dueling bandits that use two-point feedback. The work demonstrates the practicality of learning from partial feedback in interactive NLP settings where full references or multiple feedback signals are impractical.

Abstract

We present an approach to structured prediction from bandit feedback, called Bandit Structured Prediction, where only the value of a task loss function at a single predicted point, instead of a correct structure, is observed in learning. We present an application to discriminative reranking in Statistical Machine Translation (SMT) where the learning algorithm only has access to a 1-BLEU loss evaluation of a predicted translation instead of obtaining a gold standard reference translation. In our experiment bandit feedback is obtained by evaluating BLEU on reference translations without revealing them to the algorithm. This can be thought of as a simulation of interactive machine translation where an SMT system is personalized by a user who provides single point feedback to predicted translations. Our experiments show that our approach improves translation quality and is comparable to approaches that employ more informative feedback in learning.

Paper Structure

This paper contains 13 sections, 1 theorem, 16 equations, 1 figure, 1 table, 2 algorithms.

Key Result

Theorem 1

Under conditions eq:lipschitz--eq:learningrate, for any $w_0$ in process eq:process:

Figures (1)

  • Figure 1: Corpus-BLEU on test set for early stopping at different iterations for the SMT task.

Theorems & Definitions (1)

  • Theorem 1: PolyakTsypkin:73, Thm. 1