Table of Contents
Fetching ...

Conditional Gumbel-Softmax for constrained feature selection with application to node selection in wireless sensor networks

Thomas Strypsteen, Alexander Bertrand

TL;DR

This work tackles constrained feature/node selection by introducing Conditional Gumbel-Softmax (CGS), which factors the joint selection distribution $p(Z)$ through conditioning each selected feature on a predecessor, enabling adherence to pairwise distance constraints in wireless sensor networks. The method supports end-to-end learning by replacing discrete selections with differentiable relaxed samples and introduces a polytree-based sampling scheme that masks infeasible transitions to satisfy the constraints. Applied to a constrained Wireless EEG Sensor Network (WESN) for motor execution, CGS is evaluated against a mutual-information heuristic and a vanilla unconstrained Gumbel-Softmax, demonstrating competitive performance under realistic energy-distance constraints while highlighting areas of high gradient variance and occasional suboptimal minima. The approach is general and can be extended to other constrained feature selection problems and network design tasks beyond wearable BCIs, with potential refinements in energy modeling and global metrics like latency.

Abstract

In this paper, we introduce Conditional Gumbel-Softmax as a method to perform end-to-end learning of the optimal feature subset for a given task and deep neural network (DNN) model, while adhering to certain pairwise constraints between the features. We do this by conditioning the selection of each feature in the subset on another feature. We demonstrate how this approach can be used to select the task-optimal nodes composing a wireless sensor network (WSN) while ensuring that none of the nodes that require communication between one another have too large of a distance between them, limiting the required power spent on this communication. We validate this approach on an emulated Wireless Electroencephalography (EEG) Sensor Network (WESN) solving a motor execution task. We analyze how the performance of the WESN varies as the constraints are made more stringent and how well the Conditional Gumbel-Softmax performs in comparison with a heuristic, greedy selection method. While the application focus of this paper is on wearable brain-computer interfaces, the proposed methodology is generic and can readily be applied to node deployment in wireless sensor networks and constrained feature selection in other applications as well.

Conditional Gumbel-Softmax for constrained feature selection with application to node selection in wireless sensor networks

TL;DR

This work tackles constrained feature/node selection by introducing Conditional Gumbel-Softmax (CGS), which factors the joint selection distribution through conditioning each selected feature on a predecessor, enabling adherence to pairwise distance constraints in wireless sensor networks. The method supports end-to-end learning by replacing discrete selections with differentiable relaxed samples and introduces a polytree-based sampling scheme that masks infeasible transitions to satisfy the constraints. Applied to a constrained Wireless EEG Sensor Network (WESN) for motor execution, CGS is evaluated against a mutual-information heuristic and a vanilla unconstrained Gumbel-Softmax, demonstrating competitive performance under realistic energy-distance constraints while highlighting areas of high gradient variance and occasional suboptimal minima. The approach is general and can be extended to other constrained feature selection problems and network design tasks beyond wearable BCIs, with potential refinements in energy modeling and global metrics like latency.

Abstract

In this paper, we introduce Conditional Gumbel-Softmax as a method to perform end-to-end learning of the optimal feature subset for a given task and deep neural network (DNN) model, while adhering to certain pairwise constraints between the features. We do this by conditioning the selection of each feature in the subset on another feature. We demonstrate how this approach can be used to select the task-optimal nodes composing a wireless sensor network (WSN) while ensuring that none of the nodes that require communication between one another have too large of a distance between them, limiting the required power spent on this communication. We validate this approach on an emulated Wireless Electroencephalography (EEG) Sensor Network (WESN) solving a motor execution task. We analyze how the performance of the WESN varies as the constraints are made more stringent and how well the Conditional Gumbel-Softmax performs in comparison with a heuristic, greedy selection method. While the application focus of this paper is on wearable brain-computer interfaces, the proposed methodology is generic and can readily be applied to node deployment in wireless sensor networks and constrained feature selection in other applications as well.
Paper Structure (12 sections, 12 equations, 8 figures, 1 table)

This paper contains 12 sections, 12 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Schematic representation of embedded feature selection with Gumbel-Softmax. $\mathbf{x_n}$ indicates the feature vector derived from channel $n$. During training, the output of each selection neuron $m$ is given by $\mathbf{s} _m={ \mathbf{z} ^{(m)}}^{\intercal}X$, with $\mathbf{z} ^{(m)}$ sampled from a distribution $p_{ \mathbf{\pi} ^{(m)}}( \mathbf{z} ^{(m)})$. The parameters $\mathbf{\pi} ^{(m)}$ of each distribution are jointly learned with the network weights $\mathbf{\theta}$. During inference, each neuron only passes the input feature which had the highest probability associated with it at the end of the learning phase (illustrated by the green arrows). In traditional Gumbel-Softmax, each $\mathbf{z} ^{(m)}$ is sampled from an independent concrete distribution as described in Eq. (\ref{['eq: lossM']}). In our conditional Gumbel-Softmax, $\mathbf{z} ^{(m)}$ is sampled from a conditioned concrete distribution as described in Eq. (\ref{['eq: ancestral']}). Figure adapted from strypsteen2021end.
  • Figure 2: Illustration of sampling in the conditional Gumbel-Softmax framework. In this example, the optimal 3 out of 4 features need to be selected. (a) A Bayesian network and its corresponding factorization are chosen. (b) A concrete sample $\mathbf{z}^{(1)}$ for the root vertex of the factorization is drawn through the standard Gumbel-Softmax trick of Eq. (\ref{['eq: samplingA']}). (c) The second vertex of the factorization is conditioned on the first one. Concrete samples $Z^{(2)}$ are drawn for every row of the conditional distribution matrix $\Pi^{(2)}$. These are then weighted by the previously drawn concrete sample of the conditioning vertex $\mathbf{z}^{(1)}$, resulting in a concrete sample $\mathbf{z}^{(2)}$. This procedure is then repeated for every entry of the desired feature subset.
  • Figure 3: Constrained node selection for a WSN. Given are (a) a grid of $N$ node coordinates, their pairwise distance matrix $D$ and their data $X = [\mathbf{x}_1, ..., \mathbf{x}_N]$, (b) the graph $\mathcal{G}$ representing the desired communication topology of the WSN and (c) constraints indicating a maximal allowable distance $T$ between nodes having to communicate with each other. The goal is then to find a selection of nodes $X_S = \bar{Z}^\top X$ that can be mapped to the communication topology while adhering to the constraints, while optimizing performance for the given classification/regression task.
  • Figure 4: Applying Conditional Gumbel-Softmax (d-f) to the constrained sensor node selection determined by the node layout (a), the desired communication graph (b) and the distance constraint (c). Firstly, the Bayesian network for the factorization (d) is obtained by taking the transpose of the communication graph (b). Secondly, entries of the conditional probability distributions that represent a combination of sensor nodes that are not allowed by the constraints are zeroed out (f). This guarantees that the only node configurations sampled are those that are allowed by the constraints. The diagonal elements of the conditional matrices are also zeroed to avoid selecting the same node multiple times. Sampling then occurs in the same way as described in Eq. (\ref{['eq: ancestral']}) and Fig. \ref{['fig: CGS_sampling']}. Note that while this example has a star topology as a communication graph, our method is equally applicable to line or tree topologies.
  • Figure : (a) Star and line topology
  • ...and 3 more figures