Density Estimation via Measure Transport: Outlook for Applications in the Biological Sciences
Vanessa Lopez-Marrero, Patrick R. Johnstone, Gilchan Park, Xihaier Luo
TL;DR
This work investigates density estimation from limited data via measure transport, focusing on triangular transport maps to unify processing of Gaussian and non-Gaussian distributions. By learning adaptive transport maps on randomized data subsets, the authors reveal dominant dependence structures among genes and demonstrate potential for scientific discovery in radiation biology. The approach enables explicit density evaluation, efficient sampling, and integration of prior biological knowledge (e.g., KEGG pathways) to improve classification and to extract biologically meaningful dependencies. Overall, the framework offers a principled, data-efficient tool for probabilistic modeling and hypothesis generation in complex biological systems with scarce training data.
Abstract
One among several advantages of measure transport methods is that they allow for a unified framework for processing and analysis of data distributed according to a wide class of probability measures. Within this context, we present results from computational studies aimed at assessing the potential of measure transport techniques, specifically, the use of triangular transport maps, as part of a workflow intended to support research in the biological sciences. Scenarios characterized by the availability of limited amount of sample data, which are common in domains such as radiation biology, are of particular interest. We find that when estimating a distribution density function given limited amount of sample data, adaptive transport maps are advantageous. In particular, statistics gathered from computing series of adaptive transport maps, trained on a series of randomly chosen subsets of the set of available data samples, leads to uncovering information hidden in the data. As a result, in the radiation biology application considered here, this approach provides a tool for generating hypotheses about gene relationships and their dynamics under radiation exposure.
