Table of Contents
Fetching ...

Tracing Hyperparameter Dependencies for Model Parsing via Learnable Graph Pooling Network

Xiao Guo, Vishal Asnani, Sijia Liu, Xiaoming Liu

TL;DR

This work transforms model parsing into a graph node classification task, using graph nodes and edges to represent hyperparameters and their dependencies, respectively, and incorporates a learnable pooling-unpooling mechanism tailored to model parsing, which adaptively learns hyperparameter dependencies of GMs used to generate the input image.

Abstract

Model Parsing defines the research task of predicting hyperparameters of the generative model (GM), given a generated image as input. Since a diverse set of hyperparameters is jointly employed by the generative model, and dependencies often exist among them, it is crucial to learn these hyperparameter dependencies for the improved model parsing performance. To explore such important dependencies, we propose a novel model parsing method called Learnable Graph Pooling Network (LGPN). Specifically, we transform model parsing into a graph node classification task, using graph nodes and edges to represent hyperparameters and their dependencies, respectively. Furthermore, LGPN incorporates a learnable pooling-unpooling mechanism tailored to model parsing, which adaptively learns hyperparameter dependencies of GMs used to generate the input image. We also extend our proposed method to CNN-generated image detection and coordinate attacks detection. Empirically, we achieve state-of-the-art results in model parsing and its extended applications, showing the effectiveness of our method. Our source code are available.

Tracing Hyperparameter Dependencies for Model Parsing via Learnable Graph Pooling Network

TL;DR

This work transforms model parsing into a graph node classification task, using graph nodes and edges to represent hyperparameters and their dependencies, respectively, and incorporates a learnable pooling-unpooling mechanism tailored to model parsing, which adaptively learns hyperparameter dependencies of GMs used to generate the input image.

Abstract

Model Parsing defines the research task of predicting hyperparameters of the generative model (GM), given a generated image as input. Since a diverse set of hyperparameters is jointly employed by the generative model, and dependencies often exist among them, it is crucial to learn these hyperparameter dependencies for the improved model parsing performance. To explore such important dependencies, we propose a novel model parsing method called Learnable Graph Pooling Network (LGPN). Specifically, we transform model parsing into a graph node classification task, using graph nodes and edges to represent hyperparameters and their dependencies, respectively. Furthermore, LGPN incorporates a learnable pooling-unpooling mechanism tailored to model parsing, which adaptively learns hyperparameter dependencies of GMs used to generate the input image. We also extend our proposed method to CNN-generated image detection and coordinate attacks detection. Empirically, we achieve state-of-the-art results in model parsing and its extended applications, showing the effectiveness of our method. Our source code are available.
Paper Structure (18 sections, 9 equations, 12 figures, 18 tables)

This paper contains 18 sections, 9 equations, 12 figures, 18 tables.

Figures (12)

  • Figure 1: (a) Hyperparameters define a GM that generates images. Model parsingasnani2021reverse refers to the task of predicting hyperparameters given the generated image. (b) We study the co-occurrence pattern among different hyperparameters in various GMs from the RED$140$ dataset whose composition is shown as the pie chart, and subsequently construct a directed graph to capture dependencies among these hyperparameters. (c) We define the discrete-value graph node () (e.g., L1 and Batch Norm) for each discrete hyperparameter. For each continuous hyperparameter (), we partition its range into $n$ distinct intervals, and each interval is then represented by a graph node: Parameter Number has three corresponding continuous-value graph nodes. Representations on these graph nodes are used to predict hyperparameters.
  • Figure 2: Learnable Graph Pooling Network. Given an input image $\mathbf{I}$, the proposed LGPN first uses the Generation Trace Capturing Network (Fig. \ref{['fig:three_branch']}) to extract the representation $\mathbf{f}$. Then, $\mathbf{f}$ is transformed into $\mathbf{H}$, which represents a set of graph node features and is fed into the GCN refinement block. The GCN refinement block stacks GCN layers with paired pooling-unpooling layers (Sec. \ref{['sec:GCN_refine']}) and produces the refined feature $\mathbf{V}$ for model parsing. Our method is jointly trained with $3$ different objective functions (Sec. \ref{['sec:method_train']}).
  • Figure 3: Generation Trace Capturing Network. First, convolution layers with different kernel sizes extract feature maps of the input image $\mathbf{I}$. A fusion layer concatenates these feature maps and then proceeds the concatenated feature to the ResNet branch and High-res branch.
  • Figure 4: (a) A toy example of the hyperparameter hierarchy assignment $\mathbf{M}_{l}^{s}$: both L1 and L2 belong to the category of pixel-level loss function, so they are merged into the supernode A. Nonlinearity functions (e.g., $\texttt{ReLu}$ and $\texttt{Tanh}$) and normalization methods (e.g., $\texttt{Layer Norm.}$ and $\texttt{Batch Norm.}$) are merged into supernodes B and C, respectively. (b) In the inference, discrete-value graph node features are used to classify if discrete hyperparameters are used in the given GM. We concatenate corresponding continuous-value graph node features and regress the continuous hyperparameter value.
  • Figure 5: a) Cosine similarity between generated correlation graphs (i.e., ${\mathbf{A}^{\prime}}_{0}$) for unseen GMs in one of four test sets. Each element of this matrix is the average cosine similarities of $2,000$ pairs of generated correlation graphs ${\mathbf{A}^{\prime}}_{0}$ from corresponding GMs. b) The ablation on three objective functions, defined in Sec. \ref{['sec:method_train']}. c) The model parsing performance on RED116 dataset. [Key: Best; S-GCN: stacked GCN]
  • ...and 7 more figures