Table of Contents
Fetching ...

COKE: Causal Discovery with Chronological Order and Expert Knowledge in High Proportion of Missing Manufacturing Data

Ting-Yun Ou, Ching Chang, Wen-Chih Peng

TL;DR

COKE addresses causal discovery in manufacturing where data are highly incomplete and high-dimensional, proposing a framework that avoids imputation by leveraging recipe-driven embeddings, expert knowledge, and chronological order. It combines an initial expert/chronology-informed graph with recipe-aware embeddings derived from complete and incomplete data, then uses an actor-critic reinforcement learning setup to generate variable orderings and refine edges by maximizing a $- ext{BIC}$ reward. The approach demonstrates strong improvements in F1-score over baselines across synthetic and real-world datasets with missing rates up to $90\\%$, including notable gains in real-world semiconductor data, and shows robust scalability to hundreds of sensors. This work enables reliable causal graph construction in manufacturing and suggests broader applicability to domains where domain knowledge and data missingness are intertwined.

Abstract

Understanding causal relationships between machines is crucial for fault diagnosis and optimization in manufacturing processes. Real-world datasets frequently exhibit up to 90% missing data and high dimensionality from hundreds of sensors. These datasets also include domain-specific expert knowledge and chronological order information, reflecting the recording order across different machines, which is pivotal for discerning causal relationships within the manufacturing data. However, previous methods for handling missing data in scenarios akin to real-world conditions have not been able to effectively utilize expert knowledge. Conversely, prior methods that can incorporate expert knowledge struggle with datasets that exhibit missing values. Therefore, we propose COKE to construct causal graphs in manufacturing datasets by leveraging expert knowledge and chronological order among sensors without imputing missing data. Utilizing the characteristics of the recipe, we maximize the use of samples with missing values, derive embeddings from intersections with an initial graph that incorporates expert knowledge and chronological order, and create a sensor ordering graph. The graph-generating process has been optimized by an actor-critic architecture to obtain a final graph that has a maximum reward. Experimental evaluations in diverse settings of sensor quantities and missing proportions demonstrate that our approach compared with the benchmark methods shows an average improvement of 39.9% in the F1-score. Moreover, the F1-score improvement can reach 62.6% when considering the configuration similar to real-world datasets, and 85.0% in real-world semiconductor datasets. The source code is available at https://github.com/OuTingYun/COKE.

COKE: Causal Discovery with Chronological Order and Expert Knowledge in High Proportion of Missing Manufacturing Data

TL;DR

COKE addresses causal discovery in manufacturing where data are highly incomplete and high-dimensional, proposing a framework that avoids imputation by leveraging recipe-driven embeddings, expert knowledge, and chronological order. It combines an initial expert/chronology-informed graph with recipe-aware embeddings derived from complete and incomplete data, then uses an actor-critic reinforcement learning setup to generate variable orderings and refine edges by maximizing a reward. The approach demonstrates strong improvements in F1-score over baselines across synthetic and real-world datasets with missing rates up to , including notable gains in real-world semiconductor data, and shows robust scalability to hundreds of sensors. This work enables reliable causal graph construction in manufacturing and suggests broader applicability to domains where domain knowledge and data missingness are intertwined.

Abstract

Understanding causal relationships between machines is crucial for fault diagnosis and optimization in manufacturing processes. Real-world datasets frequently exhibit up to 90% missing data and high dimensionality from hundreds of sensors. These datasets also include domain-specific expert knowledge and chronological order information, reflecting the recording order across different machines, which is pivotal for discerning causal relationships within the manufacturing data. However, previous methods for handling missing data in scenarios akin to real-world conditions have not been able to effectively utilize expert knowledge. Conversely, prior methods that can incorporate expert knowledge struggle with datasets that exhibit missing values. Therefore, we propose COKE to construct causal graphs in manufacturing datasets by leveraging expert knowledge and chronological order among sensors without imputing missing data. Utilizing the characteristics of the recipe, we maximize the use of samples with missing values, derive embeddings from intersections with an initial graph that incorporates expert knowledge and chronological order, and create a sensor ordering graph. The graph-generating process has been optimized by an actor-critic architecture to obtain a final graph that has a maximum reward. Experimental evaluations in diverse settings of sensor quantities and missing proportions demonstrate that our approach compared with the benchmark methods shows an average improvement of 39.9% in the F1-score. Moreover, the F1-score improvement can reach 62.6% when considering the configuration similar to real-world datasets, and 85.0% in real-world semiconductor datasets. The source code is available at https://github.com/OuTingYun/COKE.
Paper Structure (32 sections, 13 equations, 5 figures, 3 tables)

This paper contains 32 sections, 13 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Manufacturing process workflow. Products sequentially interact with sensors across various machines, with an established order that ensures each sensor influences only subsequent sensors on the same or next machines.
  • Figure 2: The overview framework of COKE. In this framework: 1) the actor obtains a causal graph from the ordering graph, which is generated by the dataset and the initial graph with expert knowledge and chronological order information; 2) the reward function evaluates the graph produced by the actor using the complete dataset; 3) the critic network assesses the current state's value, guiding the actor network's update through gradients derived from the reward and value.
  • Figure 3: Generation of the ordering variable $\Pi$.
  • Figure 4: The transformation from $\mathcal{G}^{\Pi}$ to $\mathcal{G}^{\Pi}_{g_{k}}$
  • Figure 5: Training information in COKE and other baselines.