Table of Contents
Fetching ...

STILTS-NLI: A Natural Language Interface for STILTS

R. A. Shaw, S. Fotopoulou, M. Taylor, M. Bremer

TL;DR

STILTS-NLI is a natural language interface that generates STILTS commands from user prompts, with agentic support for a user-friendly experience, and demonstrates that this specialised model generates valid commands that match and in some cases outperform larger proprietary models.

Abstract

The Starlink Tables Infrastructure Library Tool Set (STILTS) is a powerful suite for astronomical data analysis, particularly useful when dealing with large datasets. However, like other software suites in astronomy its comprehensive syntax creates a significant learning curve to new users. To address this, we present STILTS-NLI, a natural language interface that generates STILTS commands from user prompts, with agentic support for a user-friendly experience. We developed STILTS-NLI by fine-tuning a compact, open-source Large Language Model (LLM) on a synthetically generated dataset. This dataset was curated and validated to ensure both comprehensive coverage of key STILTS functionalities and the syntactic correctness of the resulting commands. Our results demonstrate that this specialised model generates valid commands that match and in some cases outperform larger proprietary models. By leveraging small, open-source models, STILTS-NLI provides an accessible, low-resource solution that lowers the barrier to entry for using STILTS.

STILTS-NLI: A Natural Language Interface for STILTS

TL;DR

STILTS-NLI is a natural language interface that generates STILTS commands from user prompts, with agentic support for a user-friendly experience, and demonstrates that this specialised model generates valid commands that match and in some cases outperform larger proprietary models.

Abstract

The Starlink Tables Infrastructure Library Tool Set (STILTS) is a powerful suite for astronomical data analysis, particularly useful when dealing with large datasets. However, like other software suites in astronomy its comprehensive syntax creates a significant learning curve to new users. To address this, we present STILTS-NLI, a natural language interface that generates STILTS commands from user prompts, with agentic support for a user-friendly experience. We developed STILTS-NLI by fine-tuning a compact, open-source Large Language Model (LLM) on a synthetically generated dataset. This dataset was curated and validated to ensure both comprehensive coverage of key STILTS functionalities and the syntactic correctness of the resulting commands. Our results demonstrate that this specialised model generates valid commands that match and in some cases outperform larger proprietary models. By leveraging small, open-source models, STILTS-NLI provides an accessible, low-resource solution that lowers the barrier to entry for using STILTS.
Paper Structure (21 sections, 2 equations, 7 figures, 3 tables)

This paper contains 21 sections, 2 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Workflow for creating synthetic training data.
  • Figure 2: Cosine similarity distribution of the evaluation data. For both the 2B (shaded grey) model and the larger 27B model (dashed outline).
  • Figure 3: Training and evaluation loss curves from fine-tuning the 2B (top) and 27B (bottom) models.
  • Figure 4: STILTS-NLI implementation.
  • Figure 5: Cosine similarity histogram separated by STILTS task.
  • ...and 2 more figures