Table of Contents
Fetching ...

Multiview Random Vector Functional Link Network for Predicting DNA-Binding Proteins

A. Quadir, M. Sajid, M. Tanveer

TL;DR

The paper tackles DNA-binding protein (DBP) prediction by introducing Multiview RVFL (MvRVFL), a fast, closed-form learning framework that fuses multiple feature views with a coupling term to capture cross-view correlations. It builds a two-view formulation from diverse protein descriptors and combines early and late fusion while preserving view-specific signals, enabling efficient learning and strong generalization. The authors provide theoretical generalization guarantees via Rademacher complexity and demonstrate substantial empirical gains on DBP benchmarks as well as multiple UCI/KEEL, AwA, and Corel5k datasets, with ablations confirming the coupling term’s importance and the value of PsePSSM and other features. The work offers a scalable, accurate DBP predictor and showcases effective multiview integration in randomized neural networks, with potential for extensions to more views and imbalanced data scenarios.

Abstract

The identification of DNA-binding proteins (DBPs) is essential due to their significant impact on various biological activities. Understanding the mechanisms underlying protein-DNA interactions is essential for elucidating various life activities. In recent years, machine learning-based models have been prominently utilized for DBP prediction. In this paper, to predict DBPs, we propose a novel framework termed a multiview random vector functional link (MvRVFL) network, which fuses neural network architecture with multiview learning. The MvRVFL model integrates both late and early fusion advantages, enabling separate regularization parameters for each view, while utilizing a closed-form solution for efficiently determining unknown parameters. The primal objective function incorporates a coupling term aimed at minimizing a composite of errors stemming from all views. From each of the three protein views of the DBP datasets, we extract five features. These features are then fused together by incorporating a hidden feature during the model training process. The performance of the proposed MvRVFL model on the DBP dataset surpasses that of baseline models, demonstrating its superior effectiveness. We further validate the practicality of the proposed model across diverse benchmark datasets, and both theoretical analysis and empirical results consistently demonstrate its superior generalization performance over baseline models.

Multiview Random Vector Functional Link Network for Predicting DNA-Binding Proteins

TL;DR

The paper tackles DNA-binding protein (DBP) prediction by introducing Multiview RVFL (MvRVFL), a fast, closed-form learning framework that fuses multiple feature views with a coupling term to capture cross-view correlations. It builds a two-view formulation from diverse protein descriptors and combines early and late fusion while preserving view-specific signals, enabling efficient learning and strong generalization. The authors provide theoretical generalization guarantees via Rademacher complexity and demonstrate substantial empirical gains on DBP benchmarks as well as multiple UCI/KEEL, AwA, and Corel5k datasets, with ablations confirming the coupling term’s importance and the value of PsePSSM and other features. The work offers a scalable, accurate DBP predictor and showcases effective multiview integration in randomized neural networks, with potential for extensions to more views and imbalanced data scenarios.

Abstract

The identification of DNA-binding proteins (DBPs) is essential due to their significant impact on various biological activities. Understanding the mechanisms underlying protein-DNA interactions is essential for elucidating various life activities. In recent years, machine learning-based models have been prominently utilized for DBP prediction. In this paper, to predict DBPs, we propose a novel framework termed a multiview random vector functional link (MvRVFL) network, which fuses neural network architecture with multiview learning. The MvRVFL model integrates both late and early fusion advantages, enabling separate regularization parameters for each view, while utilizing a closed-form solution for efficiently determining unknown parameters. The primal objective function incorporates a coupling term aimed at minimizing a composite of errors stemming from all views. From each of the three protein views of the DBP datasets, we extract five features. These features are then fused together by incorporating a hidden feature during the model training process. The performance of the proposed MvRVFL model on the DBP dataset surpasses that of baseline models, demonstrating its superior effectiveness. We further validate the practicality of the proposed model across diverse benchmark datasets, and both theoretical analysis and empirical results consistently demonstrate its superior generalization performance over baseline models.
Paper Structure (36 sections, 5 theorems, 49 equations, 11 figures, 11 tables, 1 algorithm)

This paper contains 36 sections, 5 theorems, 49 equations, 11 figures, 11 tables, 1 algorithm.

Key Result

Lemma 5.1

Choose $\theta$ within the interval $(0, 1)$, and let $\mathscr{G}$ denote a set of functions mapping the input space $S$ to the range $[0, 1]$. Suppose $\{x_i\}_{i=1}^n$ are independently sampled from a probability distribution $\mathcal{D}$. Then, with a probability of at least $1 - \theta$ over r

Figures (11)

  • Figure 1: Flowchart of extracting features of Protein Sequences.
  • Figure 2: Four-level discrete wavelet transform for PSSM analysis.
  • Figure 3: An intuitive illustration of MvRVFL model in two-view setting.
  • Figure 4: ROC curves comparing the performance of the proposed MvRVFL model with various DNA-binding protein prediction models.
  • Figure 5: Ablation study of the coupling term for MvRVFL: The x-axis represents the datasets indexed in Table \ref{['Average ACC and average rank for proteins datasets with oh']}, and the y-axis represents the Acc. ($\%$).
  • ...and 6 more figures

Theorems & Definitions (8)

  • Definition 1
  • Lemma 5.1
  • Lemma 5.2
  • Lemma 5.3
  • Theorem 5.4
  • proof
  • Theorem 5.5
  • proof