Joint Mean and Correlation Regression Models for Multivariate Data
Zhi Yang Tho, Francis K. C. Hui, Tao Zou
TL;DR
This paper introduces a joint mean and correlation regression model for multivariate data, allowing heterogeneous mean effects across responses and correlation structure driven by predictor-based similarity matrices. It develops joint estimating equations and a constrained, iterative estimation procedure (including an ADMM-based step) to ensure positive-definite correlation matrices, with asymptotic theory showing consistency and normality under diverging numbers of responses. Simulations demonstrate strong finite-sample performance for both mean and correlation inference, and an ecological application to Carabidae abundances in Scotland highlights heterogeneous environmental effects and trait-driven residual correlations. The framework broadens GEEs by integrating correlation regression with mean modeling, enabling simultaneous inference on mean effects and between-response dependencies in complex, multivariate, potentially high-dimensional settings.
Abstract
We propose a joint mean and correlation regression model for multivariate discrete and (semi-)continuous response data, that simultaneously regresses the mean of each response against a set of covariates, and the correlations between responses against a set of similarity/distance measures. A set of joint estimating equations are formulated to construct an estimator of both the mean regression coefficients and the correlation regression parameters. Under a general setting where the number of responses can tend to infinity, the joint estimator is demonstrated to be consistent and asymptotically normally distributed, with differing rates of convergence due to the mean regression coefficients being heterogeneous across responses. An iterative estimation procedure is developed to obtain parameter estimates in the required (constrained) parameter space. Simulations demonstrate the strong finite sample performance of the proposed estimator in terms of point estimation and inference. We apply the proposed model to a count dataset of 38 Carabidae ground beetle species sampled throughout Scotland, along with information about the environmental conditions of each site and the traits of each species. Results show the relationship between mean abundance and environmental covariates differs across the beetle species, and that beetle total length is important in driving the correlations between species.
