COMET-ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs
Jena D. Hwang, Chandra Bhagavatula, Ronan Le Bras, Jeff Da, Keisuke Sakaguchi, Antoine Bosselut, Yejin Choi
TL;DR
This work introduces Atomic-20-20, a large commonsense knowledge graph with 1.33 million tuples across 23 relations, designed to capture knowledge difficult for language models to infer. It formalizes a transfer-learning framework (COMET) that adapts pretrained LMs to generate on-demand, high-quality commonsense tuples, and provides a rigorous head-to-head comparison against ConceptNet, Atomic, and TransOMCS. Empirical results show Atomic-20-20 offers superior accuracy and coverage among CSKGs, and that COMET models trained on Atomic-20-20 outperform GPT-3 in few-shot settings while using far fewer parameters. The findings advocate for CSKG design that targets non-obvious, defeasible knowledge and their integration as both static resources and LM adapters to enhance generalization to unseen entities and events.
Abstract
Recent years have brought about a renewed interest in commonsense representation and reasoning in the field of natural language understanding. The development of new commonsense knowledge graphs (CSKG) has been central to these advances as their diverse facts can be used and referenced by machine learning models for tackling new and challenging tasks. At the same time, there remain questions about the quality and coverage of these resources due to the massive scale required to comprehensively encompass general commonsense knowledge. In this work, we posit that manually constructed CSKGs will never achieve the coverage necessary to be applicable in all situations encountered by NLP agents. Therefore, we propose a new evaluation framework for testing the utility of KGs based on how effectively implicit knowledge representations can be learned from them. With this new goal, we propose ATOMIC 2020, a new CSKG of general-purpose commonsense knowledge containing knowledge that is not readily available in pretrained language models. We evaluate its properties in comparison with other leading CSKGs, performing the first large-scale pairwise study of commonsense knowledge resources. Next, we show that ATOMIC 2020 is better suited for training knowledge models that can generate accurate, representative knowledge for new, unseen entities and events. Finally, through human evaluation, we show that the few-shot performance of GPT-3 (175B parameters), while impressive, remains ~12 absolute points lower than a BART-based knowledge model trained on ATOMIC 2020 despite using over 430x fewer parameters.
