Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, Yoshua Bengio
Abstract
This paper proposes a new model for extracting an interpretable sentence
embedding by introducing self-attention. Instead of using a vector, we use a
2-D matrix to represent the embedding, with each row of the matrix attending on
a different part of the sentence. We also propose a self-attention mechanism
and a special regularization term for the model. As a side effect, the
embedding comes with an easy way of visualizing what specific parts of the
sentence are encoded into the embedding. We evaluate our model on 3 different
tasks: author profiling, sentiment classification, and textual entailment.
Results show that our model yields a significant performance gain compared to
other sentence embedding methods in all of the 3 tasks.