Table of Contents
Fetching ...

The Effectiveness of Large Language Models in Transforming Unstructured Text to Standardized Formats

William Brach, Kristián Košťál, Michal Ries

TL;DR

The study tackles converting unstructured recipe text into Cooklang using four LLMs (GPT-4o, GPT-4o-mini, Llama3.1:70b, Llama3.1:8b) and a range of prompting strategies. It introduces a hybrid evaluation framework combining traditional metrics (WER, ROUGE-L, TER) with domain-specific scores for semantic element identification, showing that GPT-4o with Few-Shot prompts delivers the strongest performance and high semantic fidelity. While larger models generally perform better, smaller models like Llama3.1:8b show potential for optimization through targeted fine-tuning, especially when paired with Cooklang specifications and ingredient context. The results suggest broad implications for automated generation of structured data across domains (healthcare HL7, legal, technical docs), enabling scalable, machine-readable representations from unstructured text.

Abstract

The exponential growth of unstructured text data presents a fundamental challenge in modern data management and information retrieval. While Large Language Models (LLMs) have shown remarkable capabilities in natural language processing, their potential to transform unstructured text into standardized, structured formats remains largely unexplored - a capability that could revolutionize data processing workflows across industries. This study breaks new ground by systematically evaluating LLMs' ability to convert unstructured recipe text into the structured Cooklang format. Through comprehensive testing of four models (GPT-4o, GPT-4o-mini, Llama3.1:70b, and Llama3.1:8b), an innovative evaluation approach is introduced that combines traditional metrics (WER, ROUGE-L, TER) with specialized metrics for semantic element identification. Our experiments reveal that GPT-4o with few-shot prompting achieves breakthrough performance (ROUGE-L: 0.9722, WER: 0.0730), demonstrating for the first time that LLMs can reliably transform domain-specific unstructured text into structured formats without extensive training. Although model performance generally scales with size, we uncover surprising potential in smaller models like Llama3.1:8b for optimization through targeted fine-tuning. These findings open new possibilities for automated structured data generation across various domains, from medical records to technical documentation, potentially transforming the way organizations process and utilize unstructured information.

The Effectiveness of Large Language Models in Transforming Unstructured Text to Standardized Formats

TL;DR

The study tackles converting unstructured recipe text into Cooklang using four LLMs (GPT-4o, GPT-4o-mini, Llama3.1:70b, Llama3.1:8b) and a range of prompting strategies. It introduces a hybrid evaluation framework combining traditional metrics (WER, ROUGE-L, TER) with domain-specific scores for semantic element identification, showing that GPT-4o with Few-Shot prompts delivers the strongest performance and high semantic fidelity. While larger models generally perform better, smaller models like Llama3.1:8b show potential for optimization through targeted fine-tuning, especially when paired with Cooklang specifications and ingredient context. The results suggest broad implications for automated generation of structured data across domains (healthcare HL7, legal, technical docs), enabling scalable, machine-readable representations from unstructured text.

Abstract

The exponential growth of unstructured text data presents a fundamental challenge in modern data management and information retrieval. While Large Language Models (LLMs) have shown remarkable capabilities in natural language processing, their potential to transform unstructured text into standardized, structured formats remains largely unexplored - a capability that could revolutionize data processing workflows across industries. This study breaks new ground by systematically evaluating LLMs' ability to convert unstructured recipe text into the structured Cooklang format. Through comprehensive testing of four models (GPT-4o, GPT-4o-mini, Llama3.1:70b, and Llama3.1:8b), an innovative evaluation approach is introduced that combines traditional metrics (WER, ROUGE-L, TER) with specialized metrics for semantic element identification. Our experiments reveal that GPT-4o with few-shot prompting achieves breakthrough performance (ROUGE-L: 0.9722, WER: 0.0730), demonstrating for the first time that LLMs can reliably transform domain-specific unstructured text into structured formats without extensive training. Although model performance generally scales with size, we uncover surprising potential in smaller models like Llama3.1:8b for optimization through targeted fine-tuning. These findings open new possibilities for automated structured data generation across various domains, from medical records to technical documentation, potentially transforming the way organizations process and utilize unstructured information.

Paper Structure

This paper contains 21 sections, 1 equation, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Proposed methodology for evaluating the ability of Large Language Models in converting a recipe to Cooklang
  • Figure 2: Comparison of Language Model Performance Across WER, ROUGE-L, and TER Metrics
  • Figure 3: Prompting Technique Performance Across WER, ROUGE-L, and TER Metrics
  • Figure 4: Impact of Cooklang Specification Integration on WER, ROUGE-L, and TER Performance Metrics
  • Figure 5: Impact of Ingredients Integration on WER, ROUGE-L, and TER Performance Metrics
  • ...and 2 more figures