Table of Contents
Fetching ...

Improving QA Efficiency with DistilBERT: Fine-Tuning and Inference on mobile Intel CPUs

Ngeyen Yinkfu

TL;DR

This work tackles real-time QA deployment on resource-constrained CPUs by fine-tuning DistilBERT for SQuAD v1.1. It combines exploratory data analysis, WordNet-based paraphrasing augmentation, and carefully tuned hyperparameters to achieve a validation F1 of $0.6536$ with an average CPU inference time of $0.1208$ s per question on a 13th Gen Intel i7-1355U, outperforming a rule-based baseline. Although not matching state-of-the-art full-model BERT variants, the approach demonstrates a favorable accuracy-efficiency trade-off suitable for on-device QA. The study highlights practical lessons for CPU-centric transformer deployment and outlines future work in quantization, pruning, and larger datasets to push performance further while maintaining real-time speed.

Abstract

This study presents an efficient transformer-based question-answering (QA) model optimized for deployment on a 13th Gen Intel i7-1355U CPU, using the Stanford Question Answering Dataset (SQuAD) v1.1. Leveraging exploratory data analysis, data augmentation, and fine-tuning of a DistilBERT architecture, the model achieves a validation F1 score of 0.6536 with an average inference time of 0.1208 seconds per question. Compared to a rule-based baseline (F1: 0.3124) and full BERT-based models, our approach offers a favorable trade-off between accuracy and computational efficiency. This makes it well-suited for real-time applications on resource-constrained systems. The study includes systematic evaluation of data augmentation strategies and hyperparameter configurations, providing practical insights into optimizing transformer models for CPU-based inference.

Improving QA Efficiency with DistilBERT: Fine-Tuning and Inference on mobile Intel CPUs

TL;DR

This work tackles real-time QA deployment on resource-constrained CPUs by fine-tuning DistilBERT for SQuAD v1.1. It combines exploratory data analysis, WordNet-based paraphrasing augmentation, and carefully tuned hyperparameters to achieve a validation F1 of with an average CPU inference time of s per question on a 13th Gen Intel i7-1355U, outperforming a rule-based baseline. Although not matching state-of-the-art full-model BERT variants, the approach demonstrates a favorable accuracy-efficiency trade-off suitable for on-device QA. The study highlights practical lessons for CPU-centric transformer deployment and outlines future work in quantization, pruning, and larger datasets to push performance further while maintaining real-time speed.

Abstract

This study presents an efficient transformer-based question-answering (QA) model optimized for deployment on a 13th Gen Intel i7-1355U CPU, using the Stanford Question Answering Dataset (SQuAD) v1.1. Leveraging exploratory data analysis, data augmentation, and fine-tuning of a DistilBERT architecture, the model achieves a validation F1 score of 0.6536 with an average inference time of 0.1208 seconds per question. Compared to a rule-based baseline (F1: 0.3124) and full BERT-based models, our approach offers a favorable trade-off between accuracy and computational efficiency. This makes it well-suited for real-time applications on resource-constrained systems. The study includes systematic evaluation of data augmentation strategies and hyperparameter configurations, providing practical insights into optimizing transformer models for CPU-based inference.

Paper Structure

This paper contains 22 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Histogram of question lengths (left) with a mode around 8–10 words, and context lengths (right) with a mode around 100–120 words.
  • Figure 2: Histogram of answer lengths (left) showing most answers are 1–5 words, and answer start positions (right) indicating a near-uniform distribution.
  • Figure 3: Histogram of question-context word overlap (left) peaking at 0.5–0.6, and scatter plot of context vs. answer length (right), showing no correlation (Pearson correlation coefficient $\approx$ 0.03).