Capturing Semantic Flow of ML-based Systems
Shin Yoo, Robert Feldt, Somin Kim, Naryeong Kim
TL;DR
This paper proposes semantic flow as a representation that captures internal semantic trajectories of ML-based systems by mapping latent activations and reasoning steps into latent spaces, and summarizing them as Semantic Flow Graphs (SFGs). It formalizes the construction with a unit of analysis $e_i$, latent mapping $embed(i,e_i)$, and aggregation $aggregate(i,s_i,S_i)$, and demonstrates two concrete instances: a CNN for CIFAR-10 and an AutoFL LLM agent, connecting SFGs to traditional control flow via SaCFGs. The work highlights applications in measuring out-of-distribution-ness, debugging, predicting execution results with LIG+GCN, and improving interpretability through domain-tailored latent spaces. By enabling the adaptation of dynamic analysis techniques to ML-based software, semantic flow aims to improve reliability, testing, and explainability of complex AI-enabled systems.
Abstract
ML-based systems are software systems that incorporates machine learning components such as Deep Neural Networks (DNNs) or Large Language Models (LLMs). While such systems enable advanced features such as high performance computer vision, natural language processing, and code generation, their internal behaviour remain largely opaque to traditional dynamic analysis such as testing: existing analysis typically concern only what is observable from the outside, such as input similarity or class label changes. We propose semantic flow, a concept designed to capture the internal behaviour of ML-based system and to provide a platform for traditional dynamic analysis techniques to be adapted to. Semantic flow combines the idea of control flow with internal states taken from executions of ML-based systems, such as activation values of a specific layer in a DNN, or embeddings of LLM responses at a specific inference step of LLM agents. The resulting representation, summarised as semantic flow graphs, can capture internal decisions that are not explicitly represented in the traditional control flow of ML-based systems. We propose the idea of semantic flow, introduce two examples using a DNN and an LLM agent, and finally sketch its properties and how it can be used to adapt existing dynamic analysis techniques for use in ML-based software systems.
