CANDLE: Iterative Conceptualization and Instantiation Distillation from Large Language Models for Commonsense Reasoning

Weiqi Wang; Tianqing Fang; Chunyang Li; Haochen Shi; Wenxuan Ding; Baixuan Xu; Zhaowei Wang; Jiaxin Bai; Xin Liu; Jiayang Cheng; Chunkit Chan; Yangqiu Song

CANDLE: Iterative Conceptualization and Instantiation Distillation from Large Language Models for Commonsense Reasoning

Weiqi Wang, Tianqing Fang, Chunyang Li, Haochen Shi, Wenxuan Ding, Baixuan Xu, Zhaowei Wang, Jiaxin Bai, Xin Liu, Jiayang Cheng, Chunkit Chan, Yangqiu Song

TL;DR

CANDLE tackles the bottleneck in generalizable commonsense reasoning by jointly distilling contextualized conceptualizations and instantiations from LLMs onto a CSKB, using critic filtering to maintain quality. The three-stage pipeline—contextualized conceptualization with ChatGPT, contextualized instantiation with LLAMA2, and critic-based iteration—yields a large-scale expansion of ATOMIC to about $6.18$ million knowledge triples and improves performance on CSKB conceptualization, generative inference, and zero-shot QA. Intrinsic evaluations show high plausibility and diversity, while extrinsic tests demonstrate meaningful gains for downstream tasks and for distilling student models. The work evidences that iterative, critic-filtered distillation can substantially enhance the coverage and utility of commonsense knowledge bases, enabling more robust and scalable reasoning systems, with code and data publicly available.

Abstract

The sequential process of conceptualization and instantiation is essential to generalizable commonsense reasoning as it allows the application of existing knowledge to unfamiliar scenarios. However, existing works tend to undervalue the step of instantiation and heavily rely on pre-built concept taxonomies and human annotations to collect both types of knowledge, resulting in a lack of instantiated knowledge to complete reasoning, high cost, and limited scalability. To tackle these challenges, we introduce CANDLE, a distillation framework that iteratively performs contextualized conceptualization and instantiation over commonsense knowledge bases by instructing large language models to generate both types of knowledge with critic filtering. By applying CANDLE to ATOMIC, we construct a comprehensive knowledge base comprising six million conceptualizations and instantiated commonsense knowledge triples. Both types of knowledge are firmly rooted in the original ATOMIC dataset, and intrinsic evaluations demonstrate their exceptional quality and diversity. Empirical results indicate that distilling CANDLE on student models provides benefits across four downstream tasks. Our code, data, and models are publicly available at https://github.com/HKUST-KnowComp/CANDLE.

CANDLE: Iterative Conceptualization and Instantiation Distillation from Large Language Models for Commonsense Reasoning

TL;DR

million knowledge triples and improves performance on CSKB conceptualization, generative inference, and zero-shot QA. Intrinsic evaluations show high plausibility and diversity, while extrinsic tests demonstrate meaningful gains for downstream tasks and for distilling student models. The work evidences that iterative, critic-filtered distillation can substantially enhance the coverage and utility of commonsense knowledge bases, enabling more robust and scalable reasoning systems, with code and data publicly available.

Abstract

Paper Structure (46 sections, 4 figures, 14 tables)

This paper contains 46 sections, 4 figures, 14 tables.

Introduction
Related Works
Conceptualization and Instantiation
Commonsense Knowledge Distillation
Definitions and Datasets
CANDLE
Contextualized Conceptualization
Contextualized Instantiation
Iterating with Critic Filtering
Evaluations and Analysis
Distillation Evaluations
Statistics and Quality.
Conceptualization Diversity.
Downstream Applications
CSKB Conceptualization
...and 31 more sections

Figures (4)

Figure 1: Examples showing several chains of conceptualization and instantiation over the event PersonX enjoys exercising in the gym. New inferential commonsense knowledge can be induced when placing the instantiation back into the original context.
Figure 2: Overview of our CANDLE framework. A running example with PersonX arrives at the bar, as a result, PersonX wants to relax is shown in the figure, where bar is first conceptualized and then instantiated by LLMs. The instantiations can be integrated back into the original CSKB and become input for the framework again.
Figure 3: Hypernyms distribution of the top 10,000 popular conceptualizations distilled from CANDLE.
Figure 4: Ablation results examining the impact of different threshold values in CANDLE's critic filtering.

CANDLE: Iterative Conceptualization and Instantiation Distillation from Large Language Models for Commonsense Reasoning

TL;DR

Abstract

CANDLE: Iterative Conceptualization and Instantiation Distillation from Large Language Models for Commonsense Reasoning

Authors

TL;DR

Abstract

Table of Contents

Figures (4)