Table of Contents
Fetching ...

Automated Archival Descriptions with Federated Intelligence of LLMs

Jinghua Groppe, Andreas Marquet, Annabel Walz, Sven Groppe

TL;DR

The paper tackles the challenge of producing standardized archival metadata by introducing an agentic AI framework that federates multiple LLMs to generate ISAD(G)-compliant descriptions. It details an architecture with specialized agents (Instructor, Context, Validator, Federator) and a rigorous candidate selection and consistency mechanism, followed by an optimization step that synthesizes the best metadata from diverse outputs. Experimental evaluation on 22 real-world ADSD archival units shows the federated approach achieves a $0.90$ average quality score, surpassing individual LLM performance and demonstrating improved completeness and alignment with standards. This work offers a scalable, context-aware solution for automated archival description workflows and provides a foundation for broader adoption of federated LLM intelligence in archival practice.

Abstract

Enforcing archival standards requires specialized expertise, and manually creating metadata descriptions for archival materials is a tedious and error-prone task. This work aims at exploring the potential of agentic AI and large language models (LLMs) in addressing the challenges of implementing a standardized archival description process. To this end, we introduce an agentic AI-driven system for automated generation of high-quality metadata descriptions of archival materials. We develop a federated optimization approach that unites the intelligence of multiple LLMs to construct optimal archival metadata. We also suggest methods to overcome the challenges associated with using LLMs for consistent metadata generation. To evaluate the feasibility and effectiveness of our techniques, we conducted extensive experiments using a real-world dataset of archival materials, which covers a variety of document types and data formats. The evaluation results demonstrate the feasibility of our techniques and highlight the superior performance of the federated optimization approach compared to single-model solutions in metadata quality and reliability.

Automated Archival Descriptions with Federated Intelligence of LLMs

TL;DR

The paper tackles the challenge of producing standardized archival metadata by introducing an agentic AI framework that federates multiple LLMs to generate ISAD(G)-compliant descriptions. It details an architecture with specialized agents (Instructor, Context, Validator, Federator) and a rigorous candidate selection and consistency mechanism, followed by an optimization step that synthesizes the best metadata from diverse outputs. Experimental evaluation on 22 real-world ADSD archival units shows the federated approach achieves a average quality score, surpassing individual LLM performance and demonstrating improved completeness and alignment with standards. This work offers a scalable, context-aware solution for automated archival description workflows and provides a foundation for broader adoption of federated LLM intelligence in archival practice.

Abstract

Enforcing archival standards requires specialized expertise, and manually creating metadata descriptions for archival materials is a tedious and error-prone task. This work aims at exploring the potential of agentic AI and large language models (LLMs) in addressing the challenges of implementing a standardized archival description process. To this end, we introduce an agentic AI-driven system for automated generation of high-quality metadata descriptions of archival materials. We develop a federated optimization approach that unites the intelligence of multiple LLMs to construct optimal archival metadata. We also suggest methods to overcome the challenges associated with using LLMs for consistent metadata generation. To evaluate the feasibility and effectiveness of our techniques, we conducted extensive experiments using a real-world dataset of archival materials, which covers a variety of document types and data formats. The evaluation results demonstrate the feasibility of our techniques and highlight the superior performance of the federated optimization approach compared to single-model solutions in metadata quality and reliability.

Paper Structure

This paper contains 12 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Agentic AI-driven system architecture for automated generation of optimal archival metadata
  • Figure 2: Box plots of number of words of each document in the evaluation dataset
  • Figure 3: LLM scores for information extraction (average over all scores for all documents)
  • Figure 4: Box plots of LLM scores for information extraction
  • Figure 5: Number of words versus scores for single elements of ISAD(G) metadata and overall score