Deep Dependency Networks and Advanced Inference Schemes for Multi-Label Classification
Shivvrat Arya, Yu Xiang, Vibhav Gogate
TL;DR
This work introduces Deep Dependency Networks (DDNs), a neuro‑symbolic framework that jointly learns neural features and a conditional dependency network to model inter‑label dependencies for multi‑label classification in images and videos. It expands inference beyond Gibbs sampling with three approaches for Most Probable Explanation (MPE): local search (random walk and greedy), and a Second‑Order MILP formulation using a piecewise linear approximation of the sigmoid/softplus functions, solved by Gurobi. Empirical results across six datasets show that DDNs with advanced MPE inference consistently outperform pure neural baselines and neural networks augmented with Markov random fields, particularly when using MILP/ILP inference; in multi‑label image classification, DDNs also surpass state‑of‑the‑art label embedding methods in several metrics. The study demonstrates the practical value of integrating optimized, constraint‑based inference with deep feature learning to capture dense label interdependencies in complex vision tasks, and discusses future extensions to broader domains and interpretability.
Abstract
We present a unified framework called deep dependency networks (DDNs) that combines dependency networks and deep learning architectures for multi-label classification, with a particular emphasis on image and video data. The primary advantage of dependency networks is their ease of training, in contrast to other probabilistic graphical models like Markov networks. In particular, when combined with deep learning architectures, they provide an intuitive, easy-to-use loss function for multi-label classification. A drawback of DDNs compared to Markov networks is their lack of advanced inference schemes, necessitating the use of Gibbs sampling. To address this challenge, we propose novel inference schemes based on local search and integer linear programming for computing the most likely assignment to the labels given observations. We evaluate our novel methods on three video datasets (Charades, TACoS, Wetlab) and three image datasets (MS-COCO, PASCAL VOC, NUS-WIDE), comparing their performance with (a) basic neural architectures and (b) neural architectures combined with Markov networks equipped with advanced inference and learning techniques. Our results demonstrate the superiority of our new DDN methods over the two competing approaches.
