Text Difficulty Study: Do machines behave the same as humans regarding text difficulty?
Bowen Chen, Xiao Ding, Li Du, Qin Bing, Ting Liu
TL;DR
This study probes whether NLP models exhibit human-like learning with respect to text difficulty by introducing the Human Learning Matching Index (HLM Index) to quantify alignment across models, tasks, and difficulty criteria. It compares LSTM and BERT across nine diverse tasks using multiple difficulty criteria, finding that LSTM shows more human-like learning while UID-SuperLinear best captures text difficulty effects. Results reveal heterogeneous sensitivity across tasks, with QA tasks like SQUAD and WT2 most affected by difficulty and others like CoNLL2003 less so; training from easy to hard accelerates convergence and improves generalization to easier data. The findings suggest that difficulty-aware learning strategies and evaluator criteria can meaningfully influence training efficiency and transfer, with practical implications for curriculum design in NLP.
Abstract
Given a task, human learns from easy to hard, whereas the model learns randomly. Undeniably, difficulty insensitive learning leads to great success in NLP, but little attention has been paid to the effect of text difficulty in NLP. In this research, we propose the Human Learning Matching Index (HLM Index) to investigate the effect of text difficulty. Experiment results show: (1) LSTM has more human-like learning behavior than BERT. (2) UID-SuperLinear gives the best evaluation of text difficulty among four text difficulty criteria. (3) Among nine tasks, some tasks' performance is related to text difficulty, whereas some are not. (4) Model trained on easy data performs best in easy and medium data, whereas trains on a hard level only perform well on hard data. (5) Training the model from easy to hard leads to fast convergence.
