Deciding if a DAG is Interesting is Hard
Jean-Lou De Carufel, Anil Maheshwari, Saeed Odak, Bodhayan Roy, Michiel Smid, Marc Vicuna
TL;DR
This work analyzes Mapper-graph inspired optimization problems on edge-weighted directed acyclic graphs, focusing on the interestingness score $\texttt{score}(\Pi) = \sum_{i=1}^{\ell} w(e_i) \cdot \log_2(i+1)$. It provides polynomial-time reductions from $3$-SAT to the IP problem and from $(3,2)$-set-cover to the $k$-IP problem, proving NP-hardness for IP (even with only two distinct weights) and for every fixed $k \ge 3$ (even with unit weights). The reductions use elaborate variable/clause gadgets and set-system constructions, and they also discuss the challenge of NP-membership due to the transcendental nature of logarithmic sums. The results motivate exploring approximation strategies, with a straightforward greedy $1/k$-approximation highlighted as a baseline and several avenues for future research in hardness and algorithm design for Mapper-related optimization problems.
Abstract
The \emph{interestingness score} of a directed path $Π= e_1, e_2, e_3, \dots, e_\ell$ in an edge-weighted directed graph $G$ is defined as $\texttt{score}(Π) := \sum_{i=1}^\ell w(e_i) \cdot \log{(i+1)}$, where $w(e_i)$ is the weight of the edge $e_i$. We consider two optimization problems that arise in the analysis of Mapper graphs, which is a powerful tool in topological data analysis. In the IP problem, the objective is to find a collection $\mathcal{P}$ of edge-disjoint paths in $G$ with the maximum total interestingness score. %; that is, two raised to the power of the sum of the weights of the paths in $\mathcal{P}$. For $k \in \mathbb{N}$, the $k$-IP problem is a variant of the IP problem with the extra constraint that each path in $\mathcal{P}$ must have exactly $k$ edges. Kalyanaraman, Kamruzzaman, and Krishnamoorthy (Journal of Computational Geometry, 2019) claim that both IP and $k$-IP (for $k \geq 3$) are NP-complete. We point out some inaccuracies in their proofs. Furthermore, we show that both problems are NP-hard in directed acyclic graphs.
