Table of Contents
Fetching ...

From Natural Language to Control Signals: A Conceptual Framework for Semantic Channel Finding in Complex Experimental Infrastructure

Thorsten Hellert, Nikolay Agladze, Alex Giovannone, Jan Jug, Frank Mayet, Mark Sherwin, Antonin Sulc, Chris Tennant

TL;DR

This paper addresses the bottleneck of locating relevant control signals in large, evolving facilities by formalizing semantic channel finding and presenting a four-paradigm framework. It validates each paradigm with proof-of-concept deployments across facilities of different scales and architectures, achieving high accuracy (roughly 90–97%) on expert queries. The work demonstrates practical, open-source implementations (direct lookup, hierarchical navigation, and middle-layer exploration) and outlines ontology-based approaches for cross-site interoperability. The proposed directions point to hybrid, memory-enabled, and ontology-driven systems that can scale with facility complexity and support AI-assisted control tasks.

Abstract

Modern experimental platforms such as particle accelerators, fusion devices, telescopes, and industrial process control systems expose tens to hundreds of thousands of control and diagnostic channels accumulated over decades of evolution. Operators and AI systems rely on informal expert knowledge, inconsistent naming conventions, and fragmented documentation to locate signals for monitoring, troubleshooting, and automated control, creating a persistent bottleneck for reliability, scalability, and language-model-driven interfaces. We formalize semantic channel finding-mapping natural-language intent to concrete control-system signals-as a general problem in complex experimental infrastructure, and introduce a four-paradigm framework to guide architecture selection across facility-specific data regimes. The paradigms span (i) direct in-context lookup over curated channel dictionaries, (ii) constrained hierarchical navigation through structured trees, (iii) interactive agent exploration using iterative reasoning and tool-based database queries, and (iv) ontology-grounded semantic search that decouples channel meaning from facility-specific naming conventions. We demonstrate each paradigm through proof-of-concept implementations at four operational facilities spanning two orders of magnitude in scale-from compact free-electron lasers to large synchrotron light sources-and diverse control-system architectures, from clean hierarchies to legacy environments. These implementations achieve 90-97% accuracy on expert-curated operational queries.

From Natural Language to Control Signals: A Conceptual Framework for Semantic Channel Finding in Complex Experimental Infrastructure

TL;DR

This paper addresses the bottleneck of locating relevant control signals in large, evolving facilities by formalizing semantic channel finding and presenting a four-paradigm framework. It validates each paradigm with proof-of-concept deployments across facilities of different scales and architectures, achieving high accuracy (roughly 90–97%) on expert queries. The work demonstrates practical, open-source implementations (direct lookup, hierarchical navigation, and middle-layer exploration) and outlines ontology-based approaches for cross-site interoperability. The proposed directions point to hybrid, memory-enabled, and ontology-driven systems that can scale with facility complexity and support AI-assisted control tasks.

Abstract

Modern experimental platforms such as particle accelerators, fusion devices, telescopes, and industrial process control systems expose tens to hundreds of thousands of control and diagnostic channels accumulated over decades of evolution. Operators and AI systems rely on informal expert knowledge, inconsistent naming conventions, and fragmented documentation to locate signals for monitoring, troubleshooting, and automated control, creating a persistent bottleneck for reliability, scalability, and language-model-driven interfaces. We formalize semantic channel finding-mapping natural-language intent to concrete control-system signals-as a general problem in complex experimental infrastructure, and introduce a four-paradigm framework to guide architecture selection across facility-specific data regimes. The paradigms span (i) direct in-context lookup over curated channel dictionaries, (ii) constrained hierarchical navigation through structured trees, (iii) interactive agent exploration using iterative reasoning and tool-based database queries, and (iv) ontology-grounded semantic search that decouples channel meaning from facility-specific naming conventions. We demonstrate each paradigm through proof-of-concept implementations at four operational facilities spanning two orders of magnitude in scale-from compact free-electron lasers to large synchrotron light sources-and diverse control-system architectures, from clean hierarchies to legacy environments. These implementations achieve 90-97% accuracy on expert-curated operational queries.

Paper Structure

This paper contains 12 sections, 4 figures.

Figures (4)

  • Figure 1: Four-stage pipeline for direct in-context semantic channel finding. Stage 1 decomposes multi-target queries into atomic sub-queries using structured outputs. Stage 2 performs semantic matching by providing the complete channel database directly in the LLM context, with precision-oriented tuning to minimize false positives. Stage 3 validates candidate channels against the ground-truth database and applies iterative correction to eliminate invalid or non-existent channel suggestions.
  • Figure 2:
  • Figure 3: Schematic of the PV Finder implementation. User queries undergo decomposition and semantic analysis to extract accelerator systems and keywords. Domain-specialized agents then navigate the normalized MML Accelerator Object representation through bounded tool interfaces to identify target EPICS process variables.
  • Figure 4: Example of an LLM transforming a natural language question into a formal SPARQL query. Despite the minor syntax differences, the two queries are functionally equivalent. When applied to each materialized graph, the correct PVs are returned for each facility.