Table of Contents
Fetching ...

In-Context-Learning-Assisted Quality Assessment Vision-Language Models for Metal Additive Manufacturing

Qiaojie Zheng, Jiucai Zhang, Xiaoli Zhang

TL;DR

This work presents in-context learning (ICL) with vision-language models (VLMs) to perform QA on metal additive manufacturing prints without large application-specific datasets. By evaluating six context-sampling strategies across two VLMs (Gemini 2.5 Flash and Gemma 3:27b) and introducing knowledge relevance, rationale validity, and conclusion correctness as evaluative metrics, the study demonstrates that ICL can achieve ML-comparable accuracy while producing human-interpretable rationales. The results reveal model-dependent preferences for context: larger models benefit from many-shot, diverse prompts, whereas smaller models excel with balanced, diverse few-shot prompts. This approach reduces data collection burdens, enhances decision transparency, and provides a framework for evaluating rationale quality in manufacturing QA.

Abstract

Vision-based quality assessment in additive manufacturing often requires dedicated machine learning models and application-specific datasets. However, data collection and model training can be expensive and time-consuming. In this paper, we leverage vision-language models' (VLMs') reasoning capabilities to assess the quality of printed parts and introduce in-context learning (ICL) to provide VLMs with necessary application-specific knowledge and demonstration samples. This method eliminates the requirement for large application-specific datasets for training models. We explored different sampling strategies for ICL to search for the optimal configuration that makes use of limited samples. We evaluated these strategies on two VLMs, Gemini-2.5-flash and Gemma3:27b, with quality assessment tasks in wire-laser direct energy deposition processes. The results show that ICL-assisted VLMs can reach quality classification accuracies similar to those of traditional machine learning models while requiring only a minimal number of samples. In addition, unlike traditional classification models that lack transparency, VLMs can generate human-interpretable rationales to enhance trust. Since there are no metrics to evaluate their interpretability in manufacturing applications, we propose two metrics, knowledge relevance and rationale validity, to evaluate the quality of VLMs' supporting rationales. Our results show that ICL-assisted VLMs can address application-specific tasks with limited data, achieving relatively high accuracy while also providing valid supporting rationales for improved decision transparency.

In-Context-Learning-Assisted Quality Assessment Vision-Language Models for Metal Additive Manufacturing

TL;DR

This work presents in-context learning (ICL) with vision-language models (VLMs) to perform QA on metal additive manufacturing prints without large application-specific datasets. By evaluating six context-sampling strategies across two VLMs (Gemini 2.5 Flash and Gemma 3:27b) and introducing knowledge relevance, rationale validity, and conclusion correctness as evaluative metrics, the study demonstrates that ICL can achieve ML-comparable accuracy while producing human-interpretable rationales. The results reveal model-dependent preferences for context: larger models benefit from many-shot, diverse prompts, whereas smaller models excel with balanced, diverse few-shot prompts. This approach reduces data collection burdens, enhances decision transparency, and provides a framework for evaluating rationale quality in manufacturing QA.

Abstract

Vision-based quality assessment in additive manufacturing often requires dedicated machine learning models and application-specific datasets. However, data collection and model training can be expensive and time-consuming. In this paper, we leverage vision-language models' (VLMs') reasoning capabilities to assess the quality of printed parts and introduce in-context learning (ICL) to provide VLMs with necessary application-specific knowledge and demonstration samples. This method eliminates the requirement for large application-specific datasets for training models. We explored different sampling strategies for ICL to search for the optimal configuration that makes use of limited samples. We evaluated these strategies on two VLMs, Gemini-2.5-flash and Gemma3:27b, with quality assessment tasks in wire-laser direct energy deposition processes. The results show that ICL-assisted VLMs can reach quality classification accuracies similar to those of traditional machine learning models while requiring only a minimal number of samples. In addition, unlike traditional classification models that lack transparency, VLMs can generate human-interpretable rationales to enhance trust. Since there are no metrics to evaluate their interpretability in manufacturing applications, we propose two metrics, knowledge relevance and rationale validity, to evaluate the quality of VLMs' supporting rationales. Our results show that ICL-assisted VLMs can address application-specific tasks with limited data, achieving relatively high accuracy while also providing valid supporting rationales for improved decision transparency.

Paper Structure

This paper contains 30 sections, 1 equation, 8 figures, 1 table, 1 algorithm.

Figures (8)

  • Figure 1: Comparison of the existing ML-based QA framework with our VLM-based one. Our VLM-based framework does not require large, labeled datasets to construct an application-specific model; instead, it leverages VLMs' reasoning capabilities to learn the necessary QA knowledge from a few in-context samples. Compared with the ML-based approach, the VLM-based framework can produce a human-interpretable classification rationale to make QA decisions more trustworthy. We also present a set of metrics to evaluate the quality of the supporting rationale.
  • Figure 2: Different sampling strategies to prepare context for ICL. Six sampling strategies are explored in this study the influences of sampling strategy on quality assessment performance.
  • Figure 3: Laser-wire direct energy deposition setup used to manufacture all learning and testing samples. The laser head is mounted on an industrial robot. Image from Liu2022
  • Figure 4: Print quality visualization and expert comments of selected samples. The bold, underscored texts highlight the knowledge points should be used when evaluating the quality. Low-quality bead experiences non-smooth surface finishes. Note that only cross-sectional images are used in QA query and ICL. The surface finish images are presented for visual reference of how bead quality is defined.
  • Figure 5: Sample outputs from VLMs for QA on the image of interest. The selected image is a borderline low-quality case to provide a challenging evaluation of ICL effectiveness. The top row contains responses from Gemma, and the bottom row contains responses from Gemini. The left column contains baseline responses from both models, and the right column contains responses from the best ICL configurations for both models. Red bold italic text indicates incorrect conclusions, while red bold underlined text indicates incorrect knowledge points; green marks their correct counterparts. Purple text shows comments on the QA output from human experts. Underlined purple text indicates comments on knowledge points.
  • ...and 3 more figures