Table of Contents
Fetching ...

Deep Variable-Block Chain with Adaptive Variable Selection

Lixiang Zhang, Lin Lin, Jia Li

TL;DR

This work introduces Deep Variable-Block Chain (DVC), a chain-structured, LSTM-inspired neural network that operates on blocks of features $X^{(v)}$ partitioned from a high-dimensional vector $X \in \mathbb{R}^p$, enabling effective modeling of non-grid variables. A forward greedy search constructs the block chain, with a global selection length $S$ determined by cross-validated error. To capture heterogeneity across data regions, the authors add Adaptive Variable Selection (AVS) via a decision-tree $\mathcal{T}_{VS}$ that assigns a region-specific number of blocks $\tilde{\nu}$ to use, forming DVC-AVS when combined with DVC predictions. Experiments on simulated and real biomedical datasets show that DVC (and especially DVC-AVS) achieves higher accuracy at reduced dimensionality and reveals region-specific sets of important variables, including robustness to highly correlated features. The approach offers interpretable, block-wise variable interactions and a principled framework for adaptive feature selection in high-dimensional non-grid data, with potential impact in biomarker discovery and precision medicine.

Abstract

The architectures of deep neural networks (DNN) rely heavily on the underlying grid structure of variables, for instance, the lattice of pixels in an image. For general high dimensional data with variables not associated with a grid, the multi-layer perceptron and deep belief network are often used. However, it is frequently observed that those networks do not perform competitively and they are not helpful for identifying important variables. In this paper, we propose a framework that imposes on blocks of variables a chain structure obtained by step-wise greedy search so that the DNN architecture can leverage the constructed grid. We call this new neural network Deep Variable-Block Chain (DVC). Because the variable blocks are used for classification in a sequential manner, we further develop the capacity of selecting variables adaptively according to a number of regions trained by a decision tree. Our experiments show that DVC outperforms other generic DNNs and other strong classifiers. Moreover, DVC can achieve high accuracy at much reduced dimensionality and sometimes reveals drastically different sets of relevant variables for different regions.

Deep Variable-Block Chain with Adaptive Variable Selection

TL;DR

This work introduces Deep Variable-Block Chain (DVC), a chain-structured, LSTM-inspired neural network that operates on blocks of features partitioned from a high-dimensional vector , enabling effective modeling of non-grid variables. A forward greedy search constructs the block chain, with a global selection length determined by cross-validated error. To capture heterogeneity across data regions, the authors add Adaptive Variable Selection (AVS) via a decision-tree that assigns a region-specific number of blocks to use, forming DVC-AVS when combined with DVC predictions. Experiments on simulated and real biomedical datasets show that DVC (and especially DVC-AVS) achieves higher accuracy at reduced dimensionality and reveals region-specific sets of important variables, including robustness to highly correlated features. The approach offers interpretable, block-wise variable interactions and a principled framework for adaptive feature selection in high-dimensional non-grid data, with potential impact in biomarker discovery and precision medicine.

Abstract

The architectures of deep neural networks (DNN) rely heavily on the underlying grid structure of variables, for instance, the lattice of pixels in an image. For general high dimensional data with variables not associated with a grid, the multi-layer perceptron and deep belief network are often used. However, it is frequently observed that those networks do not perform competitively and they are not helpful for identifying important variables. In this paper, we propose a framework that imposes on blocks of variables a chain structure obtained by step-wise greedy search so that the DNN architecture can leverage the constructed grid. We call this new neural network Deep Variable-Block Chain (DVC). Because the variable blocks are used for classification in a sequential manner, we further develop the capacity of selecting variables adaptively according to a number of regions trained by a decision tree. Our experiments show that DVC outperforms other generic DNNs and other strong classifiers. Moreover, DVC can achieve high accuracy at much reduced dimensionality and sometimes reveals drastically different sets of relevant variables for different regions.

Paper Structure

This paper contains 12 sections, 4 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: (a) A typical RNN and the chain-like architecture if we unfold it. $W, U, H$ are weight matrices shared across the entire chain. $X$'s are input sequential data, $h$'s are hidden layers and $o$'s are outputs. (b) LSTM cell structure. The red unit at time step $t$ indicates the forget gate $f_t$, which controls the proportion of information that would the removed from previous memory cell $c_{t-1}$. The green units refer to the input gate $i_t$ and $a_t$, where $i_t$ controls the proportion of new information that would be added into current memory $c_t$ and $a_t$ generates a proposal of new information. The blue units represent the output gate $o_t$, which controls how much information would be delivered from $c_t$ to $h_t$ and influence the next cell.
  • Figure 2: The architecture of DVC is a cascaded sequence of cells, each taking input data from one variable block. The red unit indicates the forget gate $f^{(t)}$. The green units are the input gate $i^{(t)}$ and input proposal $a^{(t)}$. The blue square unit indicates the output gate $o^{(t)}$. The weight matrices, $W_i^{(t)}$, $W_f^{(t)}$, and $W_o^{(t)}$ are not shared across the cells.
  • Figure 3: The variable selection tree for the simulated dataset. In each node, the first number is the average $\nu$-number over points in the node, and the number above the node is the node ID. The proportions of data points with different $\nu$-numbers are shown in the white box below each leaf node. The variable used to split the node is noted beneath the node.
  • Figure 4: Variable block structure of the breast cancer data. There are 6 variable blocks in total. The numbers on the first line are the variable block labels. For example, the variable block labeled as 1 contains the respective means of 5 size-related features.
  • Figure 5: Breast Cancer Data variable selection tree trained with the selected features and $\nu$-numbers. In each node, the first number is the average of $\nu$-numbers and the number above the node is the node ID. The proportions of data points of different $\nu$-number are shown in white box below leaf nodes $2, 6, 7$ respectively. The text below the node is the variable that used for split decision.