Assisting humans in complex comparisons: automated information comparison at scale
Truman Yuen, Graham A. Watt, Yuri Lawryshyn
TL;DR
This work tackles the token-length and retrieval constraints of large language models in large-scale information comparison by introducing ASC$^2$End, a pre-retrieval pipeline combining abstractive summarization, criteria embedding, and retrieval-augmented generation to perform cross-document comparisons with minimal domain-specific training. It partitions tasks into machine-level (summarization) and human-level (comparison reasoning), selecting suitable models (e.g., Mistral $7$B for DS and GPT-4 for CA) to balance efficiency and reasoning quality. Through evaluation on a $1253$-document financial news corpus and a $20$-page sustainability criteria, the system demonstrates strong ROUGE performance for summarization and superior CA accuracy with GPT-4, while ablations confirm the necessity of DS, RAG, and CA. The framework achieves time- and cost-efficient scaling for automated information analysis across domains, with practical implications for rapid, evidence-based decision support in finance and other knowledge areas.
Abstract
Generative Large Language Models enable efficient analytics across knowledge domains, rivalling human experts in information comparisons. However, the applications of LLMs for information comparisons face scalability challenges due to the difficulties in maintaining information across large contexts and overcoming model token limitations. To address these challenges, we developed the novel Abstractive Summarization & Criteria-driven Comparison Endpoint (ASC$^2$End) system to automate information comparison at scale. Our system employs Semantic Text Similarity comparisons for generating evidence-supported analyses. We utilize proven data-handling strategies such as abstractive summarization and retrieval augmented generation to overcome token limitations and retain relevant information during model inference. Prompts were designed using zero-shot strategies to contextualize information for improved model reasoning. We evaluated abstractive summarization using ROUGE scoring and assessed the generated comparison quality using survey responses. Models evaluated on the ASC$^2$End system show desirable results providing insights on the expected performance of the system. ASC$^2$End is a novel system and tool that enables accurate, automated information comparison at scale across knowledge domains, overcoming limitations in context length and retrieval.
