Creating a Fine Grained Entity Type Taxonomy Using LLMs
Michael Gunn, Dohyun Park, Nidhish Kamath
TL;DR
The paper examines whether GPT-4 can autonomously construct a large, fine-grained entity type taxonomy by starting from broad categories and refining through iterative prompting. It documents an initial attempt using UltraFine-type inputs, then pivots to an unrestricted, iterative approach that leverages file-backed context and subtree-focused expansion, achieving 5000+ types with depth up to 10. The resulting taxonomy demonstrates high subjective quality and has practical implications for information extraction tasks such as relation extraction and event argument extraction, while enabling pattern-based branch combinations to improve coverage. The work showcases GPT-4's potential to automate structured knowledge representations and to support domain-specific ontologies, presenting a scalable methodology with broad applicability in computational linguistics and AI research.
Abstract
In this study, we investigate the potential of GPT-4 and its advanced iteration, GPT-4 Turbo, in autonomously developing a detailed entity type taxonomy. Our objective is to construct a comprehensive taxonomy, starting from a broad classification of entity types - including objects, time, locations, organizations, events, actions, and subjects - similar to existing manually curated taxonomies. This classification is then progressively refined through iterative prompting techniques, leveraging GPT-4's internal knowledge base. The result is an extensive taxonomy comprising over 5000 nuanced entity types, which demonstrates remarkable quality upon subjective evaluation. We employed a straightforward yet effective prompting strategy, enabling the taxonomy to be dynamically expanded. The practical applications of this detailed taxonomy are diverse and significant. It facilitates the creation of new, more intricate branches through pattern-based combinations and notably enhances information extraction tasks, such as relation extraction and event argument extraction. Our methodology not only introduces an innovative approach to taxonomy creation but also opens new avenues for applying such taxonomies in various computational linguistics and AI-related fields.
