Learning Action Conditions from Instructional Manuals for Instruction Understanding
Te-Lin Wu, Caiqi Zhang, Qingyuan Hu, Alex Spangher, Nanyun Peng
TL;DR
The paper defines and tackles action-condition inference in real-world instructional manuals, highlighting the need to extract preconditions and postconditions to support autonomous and assistive task execution. It builds a densely annotated evaluation dataset from WikiHow and Instructables, and proposes a weakly supervised learning approach that combines linguistic heuristics (entity tracing, keywords, temporal reasoning) with two transformer-based model variants (non-contextualized and contextualized). The study shows that leveraging full instruction context yields substantial improvements over context-free baselines, and that the designed heuristics plus self-training provide additional gains in low-resource settings, though human performance remains higher by a sizable margin. These findings advance procedural text understanding and offer practical resources and directions for end-to-end action-condition extraction and integration of external knowledge to improve real-world instruction comprehension.
Abstract
The ability to infer pre- and postconditions of an action is vital for comprehending complex instructions, and is essential for applications such as autonomous instruction-guided agents and assistive AI that supports humans to perform physical tasks. In this work, we propose a task dubbed action condition inference, and collecting a high-quality, human annotated dataset of preconditions and postconditions of actions in instructional manuals. We propose a weakly supervised approach to automatically construct large-scale training instances from online instructional manuals, and curate a densely human-annotated and validated dataset to study how well the current NLP models can infer action-condition dependencies in the instruction texts. We design two types of models differ by whether contextualized and global information is leveraged, as well as various combinations of heuristics to construct the weak supervisions. Our experimental results show a >20% F1-score improvement with considering the entire instruction contexts and a >6% F1-score benefit with the proposed heuristics.
