Table of Contents
Fetching ...

Ex3: Automatic Novel Writing by Extracting, Excelsior and Expanding

Lei Huang, Jiaming Guo, Guanhua He, Xishan Zhang, Rui Zhang, Shaohui Peng, Shaoli Liu, Tianshi Chen

TL;DR

Ex3 tackles long-form novel generation by learning from raw novels rather than relying solely on prompt engineering. It introduces a three-stage framework—Extracting, Excelsior, and Expanding—that first derives hierarchical structure and entity information from texts, then fine-tunes an instruction-following LLM on a structure-informed corpus, and finally expands premises into arbitrarily long narratives via a depth-first, tree-like generation process. The approach yields higher-quality long-form novels than prior hierarchical methods, demonstrated through comprehensive human evaluations and automation metrics, and shows robust performance across medium- and long-length stories. The framework embodies a self-improvement loop by using summarization to train the model, reducing reliance on hand-crafted prompts and enabling controllable, genre-aligned writing with potential for multi-language and interactive generation in future work.

Abstract

Generating long-term texts such as novels using artificial intelligence has always been a challenge. A common approach is to use large language models (LLMs) to construct a hierarchical framework that first plans and then writes. Despite the fact that the generated novels reach a sufficient length, they exhibit poor logical coherence and appeal in their plots and deficiencies in character and event depiction, ultimately compromising the overall narrative quality. In this paper, we propose a method named Extracting Excelsior and Expanding. Ex3 initially extracts structure information from raw novel data. By combining this structure information with the novel data, an instruction-following dataset is meticulously crafted. This dataset is then utilized to fine-tune the LLM, aiming for excelsior generation performance. In the final stage, a tree-like expansion method is deployed to facilitate the generation of arbitrarily long novels. Evaluation against previous methods showcases Ex3's ability to produce higher-quality long-form novels.

Ex3: Automatic Novel Writing by Extracting, Excelsior and Expanding

TL;DR

Ex3 tackles long-form novel generation by learning from raw novels rather than relying solely on prompt engineering. It introduces a three-stage framework—Extracting, Excelsior, and Expanding—that first derives hierarchical structure and entity information from texts, then fine-tunes an instruction-following LLM on a structure-informed corpus, and finally expands premises into arbitrarily long narratives via a depth-first, tree-like generation process. The approach yields higher-quality long-form novels than prior hierarchical methods, demonstrated through comprehensive human evaluations and automation metrics, and shows robust performance across medium- and long-length stories. The framework embodies a self-improvement loop by using summarization to train the model, reducing reliance on hand-crafted prompts and enabling controllable, genre-aligned writing with potential for multi-language and interactive generation in future work.

Abstract

Generating long-term texts such as novels using artificial intelligence has always been a challenge. A common approach is to use large language models (LLMs) to construct a hierarchical framework that first plans and then writes. Despite the fact that the generated novels reach a sufficient length, they exhibit poor logical coherence and appeal in their plots and deficiencies in character and event depiction, ultimately compromising the overall narrative quality. In this paper, we propose a method named Extracting Excelsior and Expanding. Ex3 initially extracts structure information from raw novel data. By combining this structure information with the novel data, an instruction-following dataset is meticulously crafted. This dataset is then utilized to fine-tune the LLM, aiming for excelsior generation performance. In the final stage, a tree-like expansion method is deployed to facilitate the generation of arbitrarily long novels. Evaluation against previous methods showcases Ex3's ability to produce higher-quality long-form novels.
Paper Structure (66 sections, 11 figures, 8 tables)

This paper contains 66 sections, 11 figures, 8 tables.

Figures (11)

  • Figure 1: The Ex3 framework for novel writing. Extract for structure information extraction; Excelsior for corpus construction and LLM fine-tuning; Expand for automatic novel generation.
  • Figure 2: Group the Text by Similarity. We calculate the semantic similarity scores and segment a window. Then we choose the paragraph corresponding to the minimum score in the window as the partition.
  • Figure 3: A brief example of Chapter Summarizing. A chapter consisting of 38 paragraphs is divided into 8 groups, and each group is summarized by LLMs individually. These group summaries are integrated to produce the comprehensive summary of the chapter.
  • Figure 4: An illustration of Structure-Info Extraction. After obtaining chapter summaries for the entire novel using the Chapter Summarizing method, we group all the chapter summaries by semantic similarity and then generate a summary for each group. We repeat this process until we eventually condense it into a single summary.
  • Figure 5: A simple example of Entity Information Extraction method. When it comes to the current plot, the entities captured by LLMs, along with recent visits, are retrieved from the database to obtain the historical information of relevant entities prior to the current plot events. Then the current plot and historical information are both input to LLMs to obtain the latest entity information, thereby adding new entries or updating the information of related existing entries in the database.
  • ...and 6 more figures