Are Large Language Models the New Interface for Data Pipelines?
Sylvio Barbon Junior, Paolo Ceravolo, Sven Groppe, Mustafa Jarrar, Samira Maghool, Florence Sèdes, Soror Sahri, Maurice Van Keulen
TL;DR
The paper addresses how Large Language Models (LLMs) can serve as intuitive interfaces for data pipelines and how their capabilities can be synergistically integrated with XAI, AutoML, Knowledge Graphs, and Big Data analytics. It provides a conceptual analysis of the roles and interactions among these technologies, with examples such as NL querying of graphs, explainable outputs, and AutoML-enabled workflow automation. The main contributions are a structured overview of integration opportunities, a discussion of practical challenges (cost, energy consumption, bias, reproducibility), and proposed directions for responsible, scalable deployment. The work offers a roadmap for leveraging LLMs to modernize data pipelines while highlighting governance and sustainability considerations crucial for real-world impact across domains.
Abstract
A Language Model is a term that encompasses various types of models designed to understand and generate human communication. Large Language Models (LLMs) have gained significant attention due to their ability to process text with human-like fluency and coherence, making them valuable for a wide range of data-related tasks fashioned as pipelines. The capabilities of LLMs in natural language understanding and generation, combined with their scalability, versatility, and state-of-the-art performance, enable innovative applications across various AI-related fields, including eXplainable Artificial Intelligence (XAI), Automated Machine Learning (AutoML), and Knowledge Graphs (KG). Furthermore, we believe these models can extract valuable insights and make data-driven decisions at scale, a practice commonly referred to as Big Data Analytics (BDA). In this position paper, we provide some discussions in the direction of unlocking synergies among these technologies, which can lead to more powerful and intelligent AI solutions, driving improvements in data pipelines across a wide range of applications and domains integrating humans, computers, and knowledge.
