Table of Contents
Fetching ...

ChatCite: LLM Agent with Human Workflow Guidance for Comparative Literature Summary

Yutong Li, Lu Chen, Aiwei Liu, Kai Yu, Lijie Wen

TL;DR

ChatCite is introduced, an LLM agent with human workflow guidance for comparative literature summary that first extracts key elements from relevant literature and then generates summaries using a Reflective Incremental Mechanism, and a LLM-based automatic evaluation metric, G-Score, is devised in refer to the human evaluation criteria.

Abstract

The literature review is an indispensable step in the research process. It provides the benefit of comprehending the research problem and understanding the current research situation while conducting a comparative analysis of prior works. However, literature summary is challenging and time consuming. The previous LLM-based studies on literature review mainly focused on the complete process, including literature retrieval, screening, and summarization. However, for the summarization step, simple CoT method often lacks the ability to provide extensive comparative summary. In this work, we firstly focus on the independent literature summarization step and introduce ChatCite, an LLM agent with human workflow guidance for comparative literature summary. This agent, by mimicking the human workflow, first extracts key elements from relevant literature and then generates summaries using a Reflective Incremental Mechanism. In order to better evaluate the quality of the generated summaries, we devised a LLM-based automatic evaluation metric, G-Score, in refer to the human evaluation criteria. The ChatCite agent outperformed other models in various dimensions in the experiments. The literature summaries generated by ChatCite can also be directly used for drafting literature reviews.

ChatCite: LLM Agent with Human Workflow Guidance for Comparative Literature Summary

TL;DR

ChatCite is introduced, an LLM agent with human workflow guidance for comparative literature summary that first extracts key elements from relevant literature and then generates summaries using a Reflective Incremental Mechanism, and a LLM-based automatic evaluation metric, G-Score, is devised in refer to the human evaluation criteria.

Abstract

The literature review is an indispensable step in the research process. It provides the benefit of comprehending the research problem and understanding the current research situation while conducting a comparative analysis of prior works. However, literature summary is challenging and time consuming. The previous LLM-based studies on literature review mainly focused on the complete process, including literature retrieval, screening, and summarization. However, for the summarization step, simple CoT method often lacks the ability to provide extensive comparative summary. In this work, we firstly focus on the independent literature summarization step and introduce ChatCite, an LLM agent with human workflow guidance for comparative literature summary. This agent, by mimicking the human workflow, first extracts key elements from relevant literature and then generates summaries using a Reflective Incremental Mechanism. In order to better evaluate the quality of the generated summaries, we devised a LLM-based automatic evaluation metric, G-Score, in refer to the human evaluation criteria. The ChatCite agent outperformed other models in various dimensions in the experiments. The literature summaries generated by ChatCite can also be directly used for drafting literature reviews.
Paper Structure (18 sections, 5 figures, 4 tables, 1 algorithm)

This paper contains 18 sections, 5 figures, 4 tables, 1 algorithm.

Figures (5)

  • Figure 1: Literature Summary Task Description
  • Figure 2: The ChatCite consists of two modules, the Key Element Extractor and the Reflective Incremental Generator. The agent mimicking human workflow generates literature summary utilizing the Key Element Extractor to process the proposed work description and reference paper in Reference Papers Set. It then iteratively generates literature summaries using each paper in the Reference Papers Set, proposed work key elements and previous summary generated with the Reflective Incremental Generator. This process is iteratively repeated until a complete related work summary is generated, and the optimal one is selected as the final result.
  • Figure 3: Ablation Study on the Reflective Mechanism. The upper and lower whiskers represent the overall range of the data, while the box displays the distribution of the middle 50% of the dataset, with a line inside the box representing the median of the data. Data points outside the boxplot are considered outliers, indicating data points that significantly deviate from the box and whiskers. It can be observed that ChatCite performs more stable across all dimensions.
  • Figure 4: Human Evaluation vs. G-Score on six dimensions of the generic summary quality. The scoring results of the G-Score model is aligned with the distribution of human evaluations.
  • Figure 5: Human Preference: Average annotator vote distribution for better generated summaries.