Leveraging Large Language Models for Automated Definition Extraction with TaxoMatic A Case Study on Media Bias

Timo Spinde; Luyang Lin; Smi Hinterreiter; Isao Echizen

Leveraging Large Language Models for Automated Definition Extraction with TaxoMatic A Case Study on Media Bias

Timo Spinde, Luyang Lin, Smi Hinterreiter, Isao Echizen

TL;DR

TaxoMatic presents an LLM-driven framework for automated definition extraction from scholarly literature, evaluated in the media bias domain. The approach uses a three-stage workflow—relevance classification, definition extraction, and evaluation—and builds a ground-truth dataset from 2,398 relevancy-rated articles and 123 definitions sourced from 113 papers. Claude-3-sonnet leads in relevance classification, while Chain-of-Thought and Role prompting yield the strongest extraction performance, revealing both the promise and limitations of current LLMs for formalizing definitions in contested domains. The work contributes a scalable methodology, a sizeable public dataset, and insights to guide future expansion to additional domains and more robust taxonomy-building efforts.

Abstract

This paper introduces TaxoMatic, a framework that leverages large language models to automate definition extraction from academic literature. Focusing on the media bias domain, the framework encompasses data collection, LLM-based relevance classification, and extraction of conceptual definitions. Evaluated on a dataset of 2,398 manually rated articles, the study demonstrates the frameworks effectiveness, with Claude-3-sonnet achieving the best results in both relevance classification and definition extraction. Future directions include expanding datasets and applying TaxoMatic to additional domains.

Leveraging Large Language Models for Automated Definition Extraction with TaxoMatic A Case Study on Media Bias

TL;DR

Abstract

Leveraging Large Language Models for Automated Definition Extraction with TaxoMatic A Case Study on Media Bias

TL;DR

Abstract

Paper Structure

Table of Contents