Optimal estimation of Gaussian (poly)trees
Yuhao Wang, Ming Gao, Wai Ming Tai, Bryon Aragam, Arnab Bhattacharyya
TL;DR
This work provides a unified finite-sample analysis for learning Gaussian trees and polytrees, addressing both distribution learning (via KL distance) and structure learning (exact recovery). It introduces a Chow-Liu–type method for distribution learning and a PC-tree algorithm for polytree structure learning based on partial correlations, with explicit upper and matching lower bounds across non-realizable, realizable, and faithful scenarios. The results reveal phase transitions in sample complexity between distribution and structure learning and establish minimax optimality under strong-tree-faithfulness. Empirically, PC-Tree demonstrates superior exact-recovery performance against classical baselines, illustrating practical viability for learning tree-like Gaussian networks. The findings advance understanding of when structure can be learned efficiently from data under realistic assumptions and guide future work on extending to broader graphical models and non-Gaussian settings.
Abstract
We develop optimal algorithms for learning undirected Gaussian trees and directed Gaussian polytrees from data. We consider both problems of distribution learning (i.e. in KL distance) and structure learning (i.e. exact recovery). The first approach is based on the Chow-Liu algorithm, and learns an optimal tree-structured distribution efficiently. The second approach is a modification of the PC algorithm for polytrees that uses partial correlation as a conditional independence tester for constraint-based structure learning. We derive explicit finite-sample guarantees for both approaches, and show that both approaches are optimal by deriving matching lower bounds. Additionally, we conduct numerical experiments to compare the performance of various algorithms, providing further insights and empirical evidence.
