Table of Contents
Fetching ...

Unsupervised Generative Feature Transformation via Graph Contrastive Pre-training and Multi-objective Fine-tuning

Wangyang Ying, Dongjie Wang, Xuanming Hu, Yuanchun Zhou, Charu C. Aggarwal, Yanjie Fu

TL;DR

This work introduces NEAT, a label-free framework for unsupervised generative feature transformation by uniting graph-based representation, contrastive pretraining, and sequential generation. It defines a measurement-pretrain-finetune paradigm where feature-set utility is gauged via Mean Discounted Cumulative Gain (MDCG), feature-set embeddings are learned through graph contrastive learning on feature-feature graphs, and optimal transformed feature sequences are generated via an encoder-decoder-evaluator model guided by gradient-based optimization. The approach demonstrates strong empirical performance across 23 datasets, improves transformation quality over a range of baselines, and exhibits robustness and efficiency in both memory and convergence. NEAT thus offers a scalable, interpretable, and task-agnostic pathway to discover informative, non-linear feature transformations without labeled data, with potential applications across science and industry.

Abstract

Feature transformation is to derive a new feature set from original features to augment the AI power of data. In many science domains such as material performance screening, while feature transformation can model material formula interactions and compositions and discover performance drivers, supervised labels are collected from expensive and lengthy experiments. This issue motivates an Unsupervised Feature Transformation Learning (UFTL) problem. Prior literature, such as manual transformation, supervised feedback guided search, and PCA, either relies on domain knowledge or expensive supervised feedback, or suffers from large search space, or overlooks non-linear feature-feature interactions. UFTL imposes a major challenge on existing methods: how to design a new unsupervised paradigm that captures complex feature interactions and avoids large search space? To fill this gap, we connect graph, contrastive, and generative learning to develop a measurement-pretrain-finetune paradigm for UFTL. For unsupervised feature set utility measurement, we propose a feature value consistency preservation perspective and develop a mean discounted cumulative gain like unsupervised metric to evaluate feature set utility. For unsupervised feature set representation pretraining, we regard a feature set as a feature-feature interaction graph, and develop an unsupervised graph contrastive learning encoder to embed feature sets into vectors. For generative transformation finetuning, we regard a feature set as a feature cross sequence and feature transformation as sequential generation. We develop a deep generative feature transformation model that coordinates the pretrained feature set encoder and the gradient information extracted from a feature set utility evaluator to optimize a transformed feature generator.

Unsupervised Generative Feature Transformation via Graph Contrastive Pre-training and Multi-objective Fine-tuning

TL;DR

This work introduces NEAT, a label-free framework for unsupervised generative feature transformation by uniting graph-based representation, contrastive pretraining, and sequential generation. It defines a measurement-pretrain-finetune paradigm where feature-set utility is gauged via Mean Discounted Cumulative Gain (MDCG), feature-set embeddings are learned through graph contrastive learning on feature-feature graphs, and optimal transformed feature sequences are generated via an encoder-decoder-evaluator model guided by gradient-based optimization. The approach demonstrates strong empirical performance across 23 datasets, improves transformation quality over a range of baselines, and exhibits robustness and efficiency in both memory and convergence. NEAT thus offers a scalable, interpretable, and task-agnostic pathway to discover informative, non-linear feature transformations without labeled data, with potential applications across science and industry.

Abstract

Feature transformation is to derive a new feature set from original features to augment the AI power of data. In many science domains such as material performance screening, while feature transformation can model material formula interactions and compositions and discover performance drivers, supervised labels are collected from expensive and lengthy experiments. This issue motivates an Unsupervised Feature Transformation Learning (UFTL) problem. Prior literature, such as manual transformation, supervised feedback guided search, and PCA, either relies on domain knowledge or expensive supervised feedback, or suffers from large search space, or overlooks non-linear feature-feature interactions. UFTL imposes a major challenge on existing methods: how to design a new unsupervised paradigm that captures complex feature interactions and avoids large search space? To fill this gap, we connect graph, contrastive, and generative learning to develop a measurement-pretrain-finetune paradigm for UFTL. For unsupervised feature set utility measurement, we propose a feature value consistency preservation perspective and develop a mean discounted cumulative gain like unsupervised metric to evaluate feature set utility. For unsupervised feature set representation pretraining, we regard a feature set as a feature-feature interaction graph, and develop an unsupervised graph contrastive learning encoder to embed feature sets into vectors. For generative transformation finetuning, we regard a feature set as a feature cross sequence and feature transformation as sequential generation. We develop a deep generative feature transformation model that coordinates the pretrained feature set encoder and the gradient information extracted from a feature set utility evaluator to optimize a transformed feature generator.
Paper Structure (19 sections, 5 equations, 8 figures, 5 tables, 1 algorithm)

This paper contains 19 sections, 5 equations, 8 figures, 5 tables, 1 algorithm.

Figures (8)

  • Figure 1: Unsupervised Feature Transformation Learning.
  • Figure 2: Postfix Expression.
  • Figure 3: An overview of NEAT. First, we explore the original feature set to collect training data under the guidance of our proposed unsupervised feature set utility. Second, we pre-train shared GNNs to capture the knowledge of the training data via graph contrastive learning and preserve it in an embedding space. Finally, we conduct multi-objective fine-tuning to readjust the created embedding space and identify the optimal transformed feature set.
  • Figure 4: The impact of graph contrastive pre-training.
  • Figure 5: The impact of unsupervised feature set utility measurement, RL-based data collector, and initial seeds.
  • ...and 3 more figures