adabmDCA 2.0 -- a flexible but easy-to-use package for Direct Coupling Analysis
Lorenzo Rosset, Roberto Netti, Anna Paola Muntoni, Martin Weigt, Francesco Zamponi
TL;DR
adabmDCA 2.0 delivers a flexible, energy-based Direct Coupling Analysis framework implemented in C++, Julia, and Python with a unified CLI. It combines dense bmDCA learning and two sparse topologies eaDCA and edDCA, enabling downstream tasks such as residue contact prediction, mutational-effect scoring, and sequence generation for proteins and RNAs. The method relies on Boltzmann learning with Monte Carlo gradient estimates and persistent contrastive divergence to fit one- and two-site statistics from MSAs, with pseudocount and sequence weighting to correct biases. The package emphasizes practical usability, convergence diagnostics, and modularity across hardware, providing robust tools for structure prediction and sequence design in biomolecular research.
Abstract
In this methods article, we provide a flexible but easy-to-use implementation of Direct Coupling Analysis (DCA) based on Boltzmann machine learning, together with a tutorial on how to use it. The package \texttt{adabmDCA 2.0} is available in different programming languages (C++, Julia, Python) usable on different architectures (single-core and multi-core CPU, GPU) using a common front-end interface. In addition to several learning protocols for dense and sparse generative DCA models, it allows to directly address common downstream tasks like residue-residue contact prediction, mutational-effect prediction, scoring of sequence libraries and generation of artificial sequences for sequence design. It is readily applicable to protein and RNA sequence data.
