Differentiable Scheduled Sampling for Credit Assignment
Kartik Goyal, Chris Dyer, Taylor Berg-Kirkpatrick
TL;DR
This work tackles exposure bias in seq2seq training by introducing differentiable relaxations of greedy decoding, enabling continuous backpropagation through earlier decoding decisions. It introduces soft-argmax and a Gumbel-based reparameterization for sample-based training, forming differentiable relaxed decoders within scheduled sampling. Empirical results on German-English MT and German NER show consistent improvements over cross-entropy and conventional scheduled sampling, highlighting improved credit assignment and potentially lower gradient variance. The approach maintains training efficiency comparable to standard seq2seq training and offers a scalable path for more informative training signals in sequence prediction tasks.
Abstract
We demonstrate that a continuous relaxation of the argmax operation can be used to create a differentiable approximation to greedy decoding for sequence-to-sequence (seq2seq) models. By incorporating this approximation into the scheduled sampling training procedure (Bengio et al., 2015)--a well-known technique for correcting exposure bias--we introduce a new training objective that is continuous and differentiable everywhere and that can provide informative gradients near points where previous decoding decisions change their value. In addition, by using a related approximation, we demonstrate a similar approach to sampled-based training. Finally, we show that our approach outperforms cross-entropy training and scheduled sampling procedures in two sequence prediction tasks: named entity recognition and machine translation.
