The Death of Feature Engineering? BERT with Linguistic Features on SQuAD 2.0
Jiawei Li, Yue Zhang
TL;DR
This work investigates augmenting a BERT-based QA model with SpaCy-derived linguistic features (NER, POS, DEP, STOP) to improve performance on SQuAD 2.0, which includes unanswerable questions. The proposed architecture fuses BERT representations with a linguistic feature projection to enhance start/end span predictions. On development data, the base model with linguistic features improves EM by 2.17 and F1 by 2.14 over BERT(base), with a best single-model test EM of 76.55 and F1 of 79.97; gains are smaller for BERT(large) due to its already strong performance. Error analysis shows the approach helps in complex linguistic contexts and locating correct spans, but determining the existence of an answer remains challenging, pointing to future work on no-answer handling and training objectives. Overall, feature engineering remains beneficial when model scale is limited by computational costs.
Abstract
Machine reading comprehension is an essential natural language processing task, which takes into a pair of context and query and predicts the corresponding answer to query. In this project, we developed an end-to-end question answering model incorporating BERT and additional linguistic features. We conclude that the BERT base model will be improved by incorporating the features. The EM score and F1 score are improved 2.17 and 2.14 compared with BERT(base). Our best single model reaches EM score 76.55 and F1 score 79.97 in the hidden test set. Our error analysis also shows that the linguistic architecture can help model understand the context better in that it can locate answers that BERT only model predicted "No Answer" wrongly.
