Words or Characters? Fine-grained Gating for Reading Comprehension
Zhilin Yang, Bhuwan Dhingra, Ye Yuan, Junjie Hu, William W. Cohen, Ruslan Salakhutdinov
TL;DR
This paper tackles the challenge of effectively combining word- and character-level token representations for reading comprehension. It introduces a fine-grained, token-property conditioned gating mechanism and extends the idea to model document-query interactions via a gated attention framework. The approach yields substantial improvements across datasets, achieving state-of-the-art results on the Children's Book Test and strong performance on SQuAD and Who Did What, while also showing gains on Twitter tagging. Visualization confirms intuitive gating behavior: rare or morphologically rich tokens leverage character-level information, while frequent tokens rely more on word-level representations, indicating robust, interpretable dynamics with practical impact for high-level NLP tasks.
Abstract
Previous work combines word-level and character-level representations using concatenation or scalar weighting, which is suboptimal for high-level tasks like reading comprehension. We present a fine-grained gating mechanism to dynamically combine word-level and character-level representations based on properties of the words. We also extend the idea of fine-grained gating to modeling the interaction between questions and paragraphs for reading comprehension. Experiments show that our approach can improve the performance on reading comprehension tasks, achieving new state-of-the-art results on the Children's Book Test dataset. To demonstrate the generality of our gating mechanism, we also show improved results on a social media tag prediction task.
