Machine Learning Techniques for Data Reduction of CFD Applications
Jaemoon Lee, Ki Sung Jung, Qian Gong, Xiao Li, Scott Klasky, Jacqueline Chen, Anand Rangarajan, Sanjay Ranka
TL;DR
The paper tackles the challenge of exascale CFD data by delivering a trustworthy, error-bounded data-reduction framework. It introduces a guaranteed block autoencoder (GBATC) that operates on multidimensional tensor blocks, uses a 3D convolutional autoencoder to capture spatiotemporal and interspecies correlations, and employs a tensor correction network plus PCA-based residual projection to guarantee reconstruction errors within a user-defined bound $||x-x^G||_2 \le \tau$, all while applying quantization and entropy coding for compression. The approach demonstrates substantial data reduction—on the order of two to three orders of magnitude—while preserving quality for both primary data and downstream quantities of interest (QoIs), outperforming the SZ baseline on the S3D DNS dataset. These results highlight the method's potential to enable scalable, QoI-preserving data management for CFD and multiphysics simulations. The work also discusses related literature on error-bounded compressors and points toward future enhancements, including extending guarantees to broader QoIs and end-to-end training.
Abstract
We present an approach called guaranteed block autoencoder that leverages Tensor Correlations (GBATC) for reducing the spatiotemporal data generated by computational fluid dynamics (CFD) and other scientific applications. It uses a multidimensional block of tensors (spanning in space and time) for both input and output, capturing the spatiotemporal and interspecies relationship within a tensor. The tensor consists of species that represent different elements in a CFD simulation. To guarantee the error bound of the reconstructed data, principal component analysis (PCA) is applied to the residual between the original and reconstructed data. This yields a basis matrix, which is then used to project the residual of each instance. The resulting coefficients are retained to enable accurate reconstruction. Experimental results demonstrate that our approach can deliver two orders of magnitude in reduction while still keeping the errors of primary data under scientifically acceptable bounds. Compared to reduction-based approaches based on SZ, our method achieves a substantially higher compression ratio for a given error bound or a better error for a given compression ratio.
