Table of Contents
Fetching ...

Deep Dependency Networks and Advanced Inference Schemes for Multi-Label Classification

Shivvrat Arya, Yu Xiang, Vibhav Gogate

TL;DR

This work introduces Deep Dependency Networks (DDNs), a neuro‑symbolic framework that jointly learns neural features and a conditional dependency network to model inter‑label dependencies for multi‑label classification in images and videos. It expands inference beyond Gibbs sampling with three approaches for Most Probable Explanation (MPE): local search (random walk and greedy), and a Second‑Order MILP formulation using a piecewise linear approximation of the sigmoid/softplus functions, solved by Gurobi. Empirical results across six datasets show that DDNs with advanced MPE inference consistently outperform pure neural baselines and neural networks augmented with Markov random fields, particularly when using MILP/ILP inference; in multi‑label image classification, DDNs also surpass state‑of‑the‑art label embedding methods in several metrics. The study demonstrates the practical value of integrating optimized, constraint‑based inference with deep feature learning to capture dense label interdependencies in complex vision tasks, and discusses future extensions to broader domains and interpretability.

Abstract

We present a unified framework called deep dependency networks (DDNs) that combines dependency networks and deep learning architectures for multi-label classification, with a particular emphasis on image and video data. The primary advantage of dependency networks is their ease of training, in contrast to other probabilistic graphical models like Markov networks. In particular, when combined with deep learning architectures, they provide an intuitive, easy-to-use loss function for multi-label classification. A drawback of DDNs compared to Markov networks is their lack of advanced inference schemes, necessitating the use of Gibbs sampling. To address this challenge, we propose novel inference schemes based on local search and integer linear programming for computing the most likely assignment to the labels given observations. We evaluate our novel methods on three video datasets (Charades, TACoS, Wetlab) and three image datasets (MS-COCO, PASCAL VOC, NUS-WIDE), comparing their performance with (a) basic neural architectures and (b) neural architectures combined with Markov networks equipped with advanced inference and learning techniques. Our results demonstrate the superiority of our new DDN methods over the two competing approaches.

Deep Dependency Networks and Advanced Inference Schemes for Multi-Label Classification

TL;DR

This work introduces Deep Dependency Networks (DDNs), a neuro‑symbolic framework that jointly learns neural features and a conditional dependency network to model inter‑label dependencies for multi‑label classification in images and videos. It expands inference beyond Gibbs sampling with three approaches for Most Probable Explanation (MPE): local search (random walk and greedy), and a Second‑Order MILP formulation using a piecewise linear approximation of the sigmoid/softplus functions, solved by Gurobi. Empirical results across six datasets show that DDNs with advanced MPE inference consistently outperform pure neural baselines and neural networks augmented with Markov random fields, particularly when using MILP/ILP inference; in multi‑label image classification, DDNs also surpass state‑of‑the‑art label embedding methods in several metrics. The study demonstrates the practical value of integrating optimized, constraint‑based inference with deep feature learning to capture dense label interdependencies in complex vision tasks, and discusses future extensions to broader domains and interpretability.

Abstract

We present a unified framework called deep dependency networks (DDNs) that combines dependency networks and deep learning architectures for multi-label classification, with a particular emphasis on image and video data. The primary advantage of dependency networks is their ease of training, in contrast to other probabilistic graphical models like Markov networks. In particular, when combined with deep learning architectures, they provide an intuitive, easy-to-use loss function for multi-label classification. A drawback of DDNs compared to Markov networks is their lack of advanced inference schemes, necessitating the use of Gibbs sampling. To address this challenge, we propose novel inference schemes based on local search and integer linear programming for computing the most likely assignment to the labels given observations. We evaluate our novel methods on three video datasets (Charades, TACoS, Wetlab) and three image datasets (MS-COCO, PASCAL VOC, NUS-WIDE), comparing their performance with (a) basic neural architectures and (b) neural architectures combined with Markov networks equipped with advanced inference and learning techniques. Our results demonstrate the superiority of our new DDN methods over the two competing approaches.
Paper Structure (21 sections, 19 equations, 4 figures, 5 tables, 1 algorithm)

This paper contains 21 sections, 19 equations, 4 figures, 5 tables, 1 algorithm.

Figures (4)

  • Figure 1: Illustrating improvements from our new inference schemes for DDNs. The DDN learns relationships between labels, and the inference schemes reason over them to accurately identify concealed objects, such as sports ball.
  • Figure 2: Illustration of Dependency Network for multi-label video classification. The NN takes video clips (frames) as input and outputs the features $e_1,e_2,...,e_n$ (denoted by red colored nodes). These features are then used by the sigmoid output ($\sigma_1$, $\ldots$, $\sigma_n$) of the dependency layer to model the local conditional distributions.
  • Figure 3: Comparison of labels predicted by Q2L liuQuery2LabelSimpleTransformer2021 and our DDN-ILP scheme on the MS-COCO dataset. Labels in bold represent the difference between the predictions of the two methods, assuming that a threshold of 0.5 is used (i.e., every label whose probability $> 0.5$ is considered a predicted label). Due to the MPE focus in DDN-ILP, only label configurations are generated, omitting corresponding probabilities. The first three column shows examples where DDN improves over Q2L, while the last column (outlined in red) shows an example where DDN is worse than Q2L.
  • Figure SF4: Piece-wise linear approximation of $log(1+e^{z_i})$