OmniStruct: Universal Text-to-Structure Generation across Diverse Schemas
James Y. Huang, Wenxuan Zhou, Nan Xu, Fei Wang, Qin Liu, Sheng Zhang, Hoifung Poon, Muhao Chen
TL;DR
OmniStruct addresses the need for a universal benchmark for text-to-structure generation by unifying diverse schemas (NER, RE, text-to-table, function calling) under a JSON-schema framework. It introduces OmniStruct, a broad benchmark assembled from multiple datasets and converted into a schema-guided format, and demonstrates that large models like GPT-4o dominate overall performance while smaller models can close the gap through synthetic instruction tuning. A three-step data-synthesis pipeline (task filtering, task synthesis, instance generation/validation) enables distillation of GPT-4o capabilities into a compact model (OmniStruct-8B), achieving strong results on several tasks and highlighting the potential for cost-effective universal text-to-structure models. The study also shows that while constrained decoding helps minimally, schema adherence alone does not guarantee high content quality, and it acknowledges limitations in focusing solely on JSON with future work extending to additional structured formats.
Abstract
The ability of Large Language Models (LLMs) to generate structured outputs that follow arbitrary schemas is crucial to a wide range of downstream tasks that require diverse structured representations of results such as information extraction, table generation, and function calling. While modern LLMs excel in generating unstructured responses in natural language, whether this advancement translates to a strong performance on text-to-structure tasks remains unclear. To bridge this gap, we first introduce OmniStruct, a comprehensive benchmark for assessing LLMs' capabilities on diverse text-to-structure tasks such as information extraction, table generation, and function calling. We build OmniStruct by identifying existing datasets across a wide range of tasks that are suitable for a structured answer format, and adapting them under a unified text-to-structure problem setting. To facilitate the development of efficient text-to-structure models, we collect high-quality training data via synthetic task generation. Without using any supervised data for OmniStruct tasks, our experiments demonstrate the possibility of fine-tuning much smaller models on synthetic data into universal structured generation models that can rival the performance of GPT-4o.
