Zero-Shot Open-Schema Entity Structure Discovery
Xueqiang Xu, Jinfeng Xiao, James Barry, Mohab Elkaref, Jiaru Zou, Pengcheng Jiang, Yunyi Zhang, Max Giammona, Geeth de Mel, Jiawei Han
TL;DR
This work tackles OpenESD by removing the need for predefined schemas or annotated data. It introduces ZOES, a zero-shot framework that enriches, refines, and unifies initial entity–attribute–value triplets into coherent entity structures using root-attribute induction and value-anchored enrichment, guided by a mutual-dependency refinement. Across Battery Science, Economics, and Politics, ZOES delivers across-domain improvements in F1 and demonstrates robustness with multiple backbones, albeit with some precision trade-offs due to enrichment noise. The approach enables scalable, schema-free extraction of detailed, context-sensitive entity representations with potential impact on knowledge graphs and downstream QA and retrieval tasks. However, computational cost and evaluation reliance on human annotations point to future work in efficiency and automated assessment.
Abstract
Entity structure extraction, which aims to extract entities and their associated attribute-value structures from text, is an essential task for text understanding and knowledge graph construction. Existing methods based on large language models (LLMs) typically rely heavily on predefined entity attribute schemas or annotated datasets, often leading to incomplete extraction results. To address these challenges, we introduce Zero-Shot Open-schema Entity Structure Discovery (ZOES), a novel approach to entity structure extraction that does not require any schema or annotated samples. ZOES operates via a principled mechanism of enrichment, refinement, and unification, based on the insight that an entity and its associated structure are mutually reinforcing. Experiments demonstrate that ZOES consistently enhances LLMs' ability to extract more complete entity structures across three different domains, showcasing both the effectiveness and generalizability of the method. These findings suggest that such an enrichment, refinement, and unification mechanism may serve as a principled approach to improving the quality of LLM-based entity structure discovery in various scenarios.
