Innovative Thinking, Infinite Humor: Humor Research of Large Language Models through Structured Thought Leaps
Han Wang, Yilin Zhao, Dian Li, Xiaohan Wang, Gang Liu, Xuguang Lan, Hui Wang
TL;DR
This work tackles the challenge of humor generation by large language models, which requires multi-hop reasoning and access to broad knowledge. It introduces LoL, a two-stage framework that combines supervised fine-tuning on judgment-oriented data with direct preference optimization, augmented by automatic instruction evolution (AIE) through a three-agent system and guided explorative self-improvement tuning (GESIT). External knowledge injection and structured thought processes are used to deepen humor understanding and improve generation, with rationale extraction via GPT-4o guiding preference data. Experiments on English and Chinese humor benchmarks and the Divergent Association Task (DAT) demonstrate state-of-the-art judgment and enhanced creative-generation capabilities, suggesting LoL’s potential to boost cross-domain creative applications of LLMs.
Abstract
Humor is previously regarded as a gift exclusive to humans for the following reasons. Humor is a culturally nuanced aspect of human language, presenting challenges for its understanding and generation. Humor generation necessitates a multi-hop reasoning process, with each hop founded on proper rationales. Although many studies, such as those related to GPT-o1, focus on logical reasoning with reflection and correction, they still fall short in humor generation. Due to the sparsity of the knowledge graph in creative thinking, it is arduous to achieve multi-hop reasoning. Consequently, in this paper, we propose a more robust framework for addressing the humor reasoning task, named LoL. LoL aims to inject external information to mitigate the sparsity of the knowledge graph, thereby enabling multi-hop reasoning. In the first stage of LoL, we put forward an automatic instruction-evolution method to incorporate the deeper and broader thinking processes underlying humor. Judgment-oriented instructions are devised to enhance the model's judgment capability, dynamically supplementing and updating the sparse knowledge graph. Subsequently, through reinforcement learning, the reasoning logic for each online-generated response is extracted using GPT-4o. In this process, external knowledge is re-introduced to aid the model in logical reasoning and the learning of human preferences. Finally, experimental results indicate that the combination of these two processes can enhance both the model's judgment ability and its generative capacity. These findings deepen our comprehension of the creative capabilities of large language models (LLMs) and offer approaches to boost LLMs' creative abilities for cross-domain innovative applications.
