
Factorization reduces computational complexity, and is therefore an important tool in statistical machine learning of high dimensional systems. Conventional molecular modeling, including molecular dynamics and Monte Carlo simulations of molecular systems, is a large research field based on approximate factorization of molecular interactions. Recently, the local distribution theory was proposed to factorize joint distribution of a given molecular system into trainable local distributions. Belief propagation algorithms are a family of exact factorization algorithms for (junction) trees, and are extended to approximate loopy belief propagation algorithms for graphs with loops. Despite the fact that factorization of probability distribution is the common foundation, computational research in molecular systems and machine learning studies utilizing belief propagation algorithms have been carried out independently with respective track of algorithm development. The connection and differences among these factorization algorithms are briefly presented in this perspective, with the hope to intrigue further development of factorization algorithms for physical modeling of complex molecular systems.
Citation: Bochuan Du, Pu Tian. Factorization in molecular modeling and belief propagation algorithms[J]. Mathematical Biosciences and Engineering, 2023, 20(12): 21147-21162. doi: 10.3934/mbe.2023935
[1] | Cristina De Ambrosi, Annalisa Barla, Lorenzo Tortolina, Nicoletta Castagnino, Raffaele Pesenti, Alessandro Verri, Alberto Ballestrero, Franco Patrone, Silvio Parodi . Parameter space exploration within dynamic simulations of signaling networks. Mathematical Biosciences and Engineering, 2013, 10(1): 103-120. doi: 10.3934/mbe.2013.10.103 |
[2] | Xiaonan Chen, Suxia Zhang . An SEIR model for information propagation with a hot search effect in complex networks. Mathematical Biosciences and Engineering, 2023, 20(1): 1251-1273. doi: 10.3934/mbe.2023057 |
[3] | Meili Tang, Qian Pan, Yurong Qian, Yuan Tian, Najla Al-Nabhan, Xin Wang . Parallel label propagation algorithm based on weight and random walk. Mathematical Biosciences and Engineering, 2021, 18(2): 1609-1628. doi: 10.3934/mbe.2021083 |
[4] | Ying Dong, Wen Chen, Hui Zhao, Xinlei Ma, Tan Gao, Xudong Li . Label propagation algorithm based on Roll-back detection and credibility assessment. Mathematical Biosciences and Engineering, 2020, 17(3): 2432-2450. doi: 10.3934/mbe.2020132 |
[5] | Jun Zhai, Bilin Xu . Research on meme transmission based on individual heterogeneity. Mathematical Biosciences and Engineering, 2021, 18(5): 5176-5193. doi: 10.3934/mbe.2021263 |
[6] | Wei Hong, Xinhang Lu, Linhai Wu, Xujin Pu . Analysis of factors influencing public attention to masks during the COVID-19 epidemic—Data from Sina Weibo. Mathematical Biosciences and Engineering, 2022, 19(7): 6469-6488. doi: 10.3934/mbe.2022304 |
[7] | Keruo Jiang, Zhen Huang, Xinyan Zhou, Chudong Tong, Minjie Zhu, Heshan Wang . Deep belief improved bidirectional LSTM for multivariate time series forecasting. Mathematical Biosciences and Engineering, 2023, 20(9): 16596-16627. doi: 10.3934/mbe.2023739 |
[8] | Xiaoyan Su, Shuwen Shang, Leihui Xiong, Ziying Hong, Jian Zhong . Research on dependent evidence combination based on principal component analysis. Mathematical Biosciences and Engineering, 2024, 21(4): 4853-4873. doi: 10.3934/mbe.2024214 |
[9] | Feiyan Ruan, Xiaotong Ding, Huiping Li, Yixuan Wang, Kemin Ye, Houming Kan . Back propagation neural network model for medical expenses in patients with breast cancer. Mathematical Biosciences and Engineering, 2021, 18(4): 3690-3698. doi: 10.3934/mbe.2021185 |
[10] | Angel Martin-del Rey . A novel model for malware propagation on wireless sensor networks. Mathematical Biosciences and Engineering, 2024, 21(3): 3967-3998. doi: 10.3934/mbe.2024176 |
Factorization reduces computational complexity, and is therefore an important tool in statistical machine learning of high dimensional systems. Conventional molecular modeling, including molecular dynamics and Monte Carlo simulations of molecular systems, is a large research field based on approximate factorization of molecular interactions. Recently, the local distribution theory was proposed to factorize joint distribution of a given molecular system into trainable local distributions. Belief propagation algorithms are a family of exact factorization algorithms for (junction) trees, and are extended to approximate loopy belief propagation algorithms for graphs with loops. Despite the fact that factorization of probability distribution is the common foundation, computational research in molecular systems and machine learning studies utilizing belief propagation algorithms have been carried out independently with respective track of algorithm development. The connection and differences among these factorization algorithms are briefly presented in this perspective, with the hope to intrigue further development of factorization algorithms for physical modeling of complex molecular systems.
It is well acknowledged that high dimensional problems are difficult to analyze and predict. Multimedia data (e.g., pictures, videos) and molecular systems (e.g., biomolecules, materials, nano-systems) are typical examples. Since analytical solutions are either extremely challenging to achieve or non-existant for accurate and detailed description of probability distributions in high dimensional spaces, various numerical approximations are essential. The two most important and distinct paths are direct utility of dimensionality reduction algorithms [1,2,3,4,5,6] and factorizations [7,8,9], and both are extensive fields. The former path is to transform a n-dimensional probability distribution Pn(x1,x2,…,xn), usually approximately, into a m-dimensional Pm(x1,x2,…,xm) with n≫m, while the latter path is to factorize Pn(x1,x2,…,xn), either exactly or approximately, into a product of K low dimensional terms ∏Ki=1Pmi(xi1,xi2,…,ximi), with the union of variables in all K terms being (x1,x2,…,xn). In this perspective, we focus on different factorizations utilized in molecular simulations and belief propagation algorithms, and hope to intrigue interests in development of more accurate and efficient approximations in relevant applications. In a probabilistic description of multivariate systems, there are three important types of problems. The first type is to find the most probable (set of) configuration(s) at a given resolution of representation, and more generally the global landscape of wells with significant probability weights, which is termed free energy landscapes in molecular systems [10,11,12] and multimodal learning in AI [13]. A highly related type of problems are to locate transition paths connecting various wells on the global landscape. Understanding such transition paths is of critical importance in accelerating sampling of other wells when starting from an arbitrary well of a landscape. Significant efforts have been invested in finding transition paths in molecular simulations [14,15,16]. The second type is to focus only on a subset of variables (or a single one in the extreme case), and the process of computing distributions of a subset of variables from the global joint distribution is termed marginalization. For example, marginalizing a two dimensional distribution P2(x,y) with both x,y∈(0,1) to the variable x is accomplished by the following two summations: P1(x=0)=P2(x=0,y=0)+P2(x=0,y=1) and P1(x=1)=P2(x=1,y=0)+P2(x=1,y=1). There are 22 terms to be calculated, and for a joint distribution with n variables each has |X| discrete values, |X|n terms need to be calculated, indicating the exponential computational complexity of marginalization. The third type is the parameterization of model parameters, including parameterizing force fields for molecular modeling and factor/node potentials for belief propagation. After introduction of typical factorization schemes in molecular modeling and belief propagation, we discuss strengths of different factorization in various type of problems. The organization of this perspective is summarized below. Following this introduction, Section 2 gives a brief introduction to exact belief propagation algorithms. Section 3 is on junction trees and approximate loopy belief propagations. Sections 4 and 5 describe factorization in regular molecular simulations and the local distribution theory. Section 6 provides discussions on the potential synergy and distinctions among these factorizations, and deep learning. Finally conclusions are presented.
As mentioned in the introduction, marginalization of high dimensional joint distributions is difficult, and in many cases intractable. Fortunately, factorization significantly reduces this cost whenever it is possible to be performed exactly, or approximately with sufficient accuracy. A n-variable system x=(x1,x2,…,xn) forms an undirected graph with each variable being a node and an edge between all pairs of interacting variables. In graph theory [17], a clique is a set of nodes with an edge between each pair of nodes in this set, and a maximal clique is one that will be destroyed by an addition of a new node into this set. For a graph G with a set of maximal cliques C [17]:
p(x)=1Z∏c∈Cf(xc) | (2.1) |
with f(xc) being the potential over the variable set c and Z being the partition function ∑x∏c∈Cf(xc), which is summation of the product of all clique potentials over all possible configurations. In a tree, the maximal clique set is the edge set E. For the specific example with 6 nodes shown in Figure 1, Eq (2.1) may be written as the following:
p(x)∝f1,2(x1,x2)f1,3(x1,x3)f2,4(x2,x4)f2,5(x2,x5)f3,6(x3,x6). | (2.2) |
Given this factorization, the computational cost bound of marginalization goes from |x|6 to 5×|x|2. Specifically, the marginal probability p1(x1) may be calculated in the marginalization process shown below (and is illustrated in Figure 1(a)) :
p1(x1)∝∑x2,x3,x4,x5,x6f1,2(x1,x2)f1,3(x1,x3)f2,4(x2,x4)f2,5(x2,x5)f3,6(x3,x6)=∑x2,x3,x4,x5f1,2(x1,x2)f1,3(x1,x3)f2,4(x2,x4)f2,5(x2,x5)∑x6f3,6(x3,x6)⏟≜m6→3(x3)=∑x2,x3,x4f1,2(x1,x2)f1,3(x1,x3)f2,4(x2,x4)m6→3(x3)∑x5f2,5(x2,x5)⏟≜m5→2(x2)=∑x2,x3f1,2(x1,x2)f1,3(x1,x3)m6→3(x3)m5→2(x2)∑x4f2,4(x2,x4)⏟≜m4→2(x2)=∑x2f1,2(x1,x2)m4→2(x2)m5→2(x2)∑x3f1,3(x1,x3)m6→3(x3)⏟≜m3→1(x1)=m3→1(x1)∑x2f1,2(x1,x2)m4→2(x2)m5→2(x2)⏟≜m2→1(x1)=m3→1(x1)m2→1(x1), | (2.3) |
and p1(x1) may be subsequently obtained with the normalization shown below:
p1(x1)=m2→1(x1)m3→1(x1)∑x′∈Xm2→1(x′)m3→1(x′) | (2.4) |
Similarly, the marginal probability p3(x3) may be obtained as shown below and is illustrated in Figure 1(b):
p1(x3)∝∑x1,x2,x4,x5,x6f1,2(x1,x2)f1,3(x1,x3)f2,4(x2,x4)f2,5(x2,x5)f3,6(x3,x6)=∑x1,x2,x4,x5f1,2(x1,x2)f1,3(x1,x3)f2,4(x2,x4)f2,5(x2,x5)∑x6f3,6(x3,x6)⏟≜m6→3(x3)=∑x1,x2,x4f1,2(x1,x2)f1,3(x1,x3)f2,4(x2,x4)m6→3(x3)∑x5f2,5(x2,x5)⏟≜m5→2(x2)=∑x1,x2f1,2(x1,x2)f1,3(x1,x3)m6→3(x3)m5→2(x2)∑x4f2,4(x2,x4)⏟≜m4→2(x2)=∑x1f1,3(x1,x3)m6→3(x3)∑x2f1,2(x1,x2)m4→2(x2)m5→2(x2)⏟≜m2→1(x1)=m6→3(x3)∑x1f1,3(x1,x3)m2→1(x1)⏟≜m1→3(x3)=m1→3(x3)m6→3(x3), | (2.5) |
and p1(x3) may be subsequently obtained with the normalization shown below:
p1(x3)=m1→3(x3)m6→3(x3)∑x′∈Xm1→3(x′)m1→3(x′) | (2.6) |
In the calculation of both marginals p1(x1) and p1(x3), the following messages are computed:
m6→3(x3)=∑x6f3,6(x3,x6),m5→2(x2)=∑x5f2,5(x2,x5),m4→2(x2)=∑x4f2,4(x2,x4),m2→1(x1)=∑x2f1,2(x1,x2)m4→2(x2)m5→2(x2). | (2.7) |
Computing the marginal for each node in a tree takes n−1 messages, and consequently n(n−1) messages are necessary for n nodes when computed respectively (see Figure 1(a), (b)). However, instead of computing marginals for each variable separately, 2(n−1) messages over n−1 edges of an n-node tree maybe computed, such that all marginals maybe obtained without repetitive message calculation as illustrated in Figure 1(c) and demonstrated in the equation below.
This message passing is the sum-product, or the belief propagation algorithm for a general tree structure with pair potentials [7,8,9].
mi→j(xj)=∑xifi,j(xi,xj)∏k∈N(i)∖jmk→i(xi) | (2.8) |
the message from i to j (mi→j) is sent once i receives messages from all of its neighbors N(i) except the node j. After two sweeps of message passing, the marginal for variable i is obtained by:
pi(xi)∝∏j∈N(i)mj→i(xi). | (2.9) |
When a unary potential is included for each node, we have:
mi→j(xj)=∑xigi(xi)fi,j(xi,xj)∏k∈N(i)∖jmk→i(xi), | (2.10) |
pi(xi)∝gi(xi)∏j∈N(i)mj→i(xi). | (2.11) |
One may easily compute the maximum probability configuration x∗ by replacing the sum operations in the Eq (2.8) or (2.10) with max operations, and by bookkeeping the corresponding state of each variable in each maximum operation for backtracking. This results in the max-product algorithm for calculating the maximum probability configuration.
In graphs where one or more loop(s) exist(s), the belief propagation algorithm ensures neither convergence nor correct calculation of marginals, and the same is true regarding the max product algorithm. Exact belief propagation maybe achieved for a general graph by constructing a chordal graph through triangulation [9], which involves adding edges to a graph such that no chordless loop of the length greater than 4 exists (see Figure 2 for a schematic illustration).
Maximal cliques in a triangulated graph can be joined to form a junction tree. The most important property of such a tree is the junction tree property, which ensures that if a node xi is in two maximal cliques cj and ck, then it must be a member of every maximal clique that is on the path from cj to ck [18,19]. The computational complexity of a junction tree is determined by the maximum tree width, which is the size of the largest maximal clique. Many different chordal graphs with different tree widths may be generated from a given graph with loops, and a number of algorithms are available to perform triangulation and construct chordal graphs [20,21,22,23]. However, no existing algorithm is available to ensure finding the best triangulation scheme, which is expected to generate one or more chordal graphs with the smallest possible tree width.
Even without assurance that exact marginals would be obtained by the belief propagation algorithm for graphs with loops, we may carry out such message passing computations anyway, and the algorithm is accordingly termed loopy belief propagation (LBP). Turbo code [24] is an outstanding example of utilizing LBP for telecommunication coding. From the parallel version [25] of the belief propagation algorithm (see Eq (3.1)) shown below:
mt+1i→j∝∑xigi(xi)fi,j(xi,xj)∏k∈N(i)∖jmtk→i(xi). | (3.1) |
Messages are usually initialized to be 1 and normalized at each time step. If a loopy belief propagation did converge, then Eq (3.1) suggests that the converged message would be a fixed point of the messaging updating function. For a general graph with pair potentials, the fixed point of belief propagation message updates are some local extrema of the Bethe variational problem [26]. The details of both the Bethe approximation and other treatments of loopy belief propagation are well beyond the scope this perspective, and interested readers are encouraged to find relevant information elsewhere [27,28].
Belief propagation was initially developed for discrete variables [7]. The summation in message computation becomes integration for continuous variables. Such integration is usually intractable for arbitrary distributions. Fortunately, gaussian distributions may be both analytically integrated and resulting in gaussians. This property may be utilized in belief propagation to treat continuous variables as gaussians. In cases where such direct approximation with gaussians is not sufficient, linearization with the first order Taylor expansion may be utilized to improve the accuracy [29].
Conventional molecular simulation is a mature methodology with a wide variety of tools. These methods are routinely utilized in many research fields, including but not limited to chemistry, biology and materials science with more than 30,000 relevant publications each year [30]. An essential component of the foundation for molecular simulation is a parameter set termed force fields, which represents interaction energies between/among various molecular degrees of freedom as a function of relevant coordinates. Conventional molecular simulation is deemed by many of its practitioners as an independent research field from, while being in fact a well defined form of, statistical machine learning. The force fields parameterization is the learning stage and the simulation and sampling constitute the inference process.
The total energy of a given molecular configuration x=(x1,x2,…,xn) for a simple Lennard-Jones (LJ) system is defined as [31]:
Etotal(x)=∑i−j,rij≤rcut4ϵ((rijσ)12−(rijσ)6). | (4.1) |
Such LJ terms are usually utilized to describe various Van de Waals interactions between neutral atoms/particles with respective parameters ϵ and σ. Apparently, this equation is essentially a factorization:
p(x)∝Exp(−Etotal)=∏i−j,rij≤rcutExp(−fi−j,rij≤rcut(rij(xi,xj))),fi−j=4ϵ((rijσ)12−(rijσ)6). | (4.2) |
with x representing the set of coordinates for all comprising atoms/particles of a specific configuration. Let us denote each particle as a node, and the corresponding interaction between a pair of particles as an edge. Each factor represents an interacting pair of particles. This factorization has a very similar form as that in the belief propagation in a tree (Eq (2.1)). However, there are three major differences. Firstly, in Eq (2.1), the graph is static with fixed number of edges. However, for a graph formed by LJ particles, edges are dynamic in two aspects. One is that as the system propagates with time, edges break as some pairs of particles drift from within to beyond cutoff and new ones form as others drift from beyond to within. The other is that for pairs within cutoff, their interactions change as a function of distance. Secondly, nodes in Eq (2.1) have discrete states, while properties of LJ particles (e.g., position and momentum) are continuous. Thirdly, in Eq (2.1), each factor is over a maximal clique comprising a pair of connected nodes. For LJ particles, when the cutoff for direct interaction is taken as 3σ, all particles within a sphere with the cutoff as its diameter are certainly within a single clique (since they all interacts) as shown in Figure 3(a). The number of particles ranges from a few to more than a dozen. LJ potential is factorized over pairs of particles instead of maximal cliques. Such factorization is apparently an approximation, and the resulting error is not easy to quantify for arbitrary LJ interactions. Apparently, the exact number of particles in each maximal clique can be different as local packing of LJ particles continuously changes over space and time. In a graph, two maximal cliques share one or more nodes are effectively connected. For densely packed molecular systems that may be represented as LJ particles, each maximal clique usually overlaps with multiple neighboring maximal cliques. Therefore, it is essentially a sure thing that most maximal cliques participate in loops. Constructing junction trees on such systems certainly results in huge tree width. Exact inference is consequently prohibitive, and the utility of LBP is necessary.
For more general molecular systems, energy contributions are more diverse [32,33,34] with addition of various bonded (including bonds, bends and dihedrals) (see Figure 4) and electrostatic interaction terms between ions or partially charged polar atoms, and are usually calculated as shown below:
Etotal=Ebond+Ebend+Edihedral+EVdW+EElectro | (4.3) |
=∑bondiEbondi+∑bendjEbendj+∑dihedralkEdihedralk+∑VdWlEVdWl+∑ESesiEESesi. | (4.4) |
Here i,j,k,l and esi are indices for bonds, bends, dihedrals, Van de Waals and electrostatic interactions respectively. Taking exponentiation of this equation results in:
Exp(−Etotal)=∏bondiExp(−Ebondi)∏bendjExp(−Ebendj)×∏dihedralkExp(−Edihedralk)∏VdWlExp(−EVdWl)∏ESesiExp(−EESesi) | (4.5) |
Bonded terms are fundamentally local in space. Van der Waals interactions are usually modeled by LJ potentials. Similar to simple LJ systems, interactions in maximal cliques are approximately represented by assuming independent local interactions involving single (VdW and bonds), double (bends) or triple (dihedrals) edge(s). There is also challenge of addressing continuous variables with non-gaussian distributions and dynamic edges. More importantly, electrostatic interactions are long-range. Should such interactions to be treated exactly, then factorization is not possible and a molecular system needs to be treated as one single maximal clique. The exponential complexity with respect to the size of the maximal clique renders exact treatment intractable in many realistic molecular systems, so treating long range interactions as being independent from both local ones and each other is an essential approximation in computational molecular science. In Eq (4.5), all interaction terms are treated as being independent regardless of interaction range in space. Fortunately, such approximations have been widely tested in molecular simulations with great successes [35]. In reality, naive pairwise calculation of electrostatic interaction diverges and usually some form of Ewald summation is utilized to ensure convergence [36,37].
Just like the repetitive local computation in the calculation of marginals as demonstrated in Eqs (2.3), (2.5) and (2.7), ubiquitous repetitive local sampling (RLS) exists in molecular simulations [30,38,39]. From a graph perspective, such RLS is fundamentally associated with the intrinsic dynamic property of graphs describing molecular systems. The local distribution theory (LDT) was developed to address RLS and tremendously increases computational efficiency without reducing accuracy/resolution [30,39]. As shown in Eq (5.1), LDT is another approximate factorization scheme for facilitating statistical machine learning of dynamic graphs in high dimensional space.
P(Φ,x)=Q(Φ,R)=Q(Φ,R)∏mi=1q(Φ,ri)m∏i=1q(Φ,ri)≈m∏i=1q(Φ,ri)exp(−∑FMED(Φ,R))exp(−∑FLR(Φ,R)) | (5.1) |
Here, Φ=(ϕ1,ϕ2,…,ϕk) is relevant thermodynamic and environmental variables, x=(x1,x2,…,xn) are molecular coordinates. R=(r1,r2,…,rm)(m≤n,m=n is preferred) are dynamic local regions, each of which represent a dynamic collection of coordinates (ri=(xi1,xi2,…,xil)) for molecular degrees of freedom. A translation, rotation and permutation invariant coordinate transformation is necessary to feed ri into neural networks or any other powerful nonlinear fitting machinery for local free energy computation. Effectively, taking a local region ri and calculating its local free energy (high dimensional potential of mean force) is similar to taking two particles in conventional molecular simulation and calculating their two-body potential of mean force. ∏mi=1q(Φ,ri) is the product of local distributions and the global correlation factor Q(Φ,R)∏mi=1q(Φ,ri) is approximately factorized into mediated (∑FMED(Φ,R)) and long range (∑FLR(Φ,R)) contributions. The learning stage of LDT is to train local distributions, which cache statistical significance of various molecular configurations within local spaces defined by a preset cutoff from the origin of a local coordinates originating from a given molecular DOF. Such local distributions, defined by trainable neural network parameters with selected neural network architectures, store both energetic and entropic contributions from extensive local sampling data of either experimental or computational origin. By caching and repetitively utilizing such local distributions, LDT eliminates RLS to a great extent, and therefore saves computational resources in sampling dynamic graphs. In the inference stage, local distributions are assembled under sampling constraints due to mediated interactions. Direct long range interactions are treated as independent from local ones (Eq (5.1)), and selection of specific method does not impact local computation. Searching for the best methodology to match the LDT scheme is an important issue to be tackled in the future. Hierarchical definition of "local" with different spatial coverage might be an efficient path to be explored for large systems with hierarchical dynamics spanning multiple time scales.
In conventional molecular simulations, pairs of interactions (or edges in a graph representing a molecular system) within cutoff are included as independent factors (Figure 3(a)). As illustrated in Figure 3(b), both local interactions confined within cutoff spherical space and mediated interactions are jointly considered in LDT. Independence assumptions are therefore significantly weaker and accordingly, improvement of accuracy is expected with tremendously increased efficiency due to substitution of RLS by local distributions. Therefore, LDT has potential to address the tradeoff dilemma between accuracy and efficiency in molecular simulations. However, this does come with a significant one-time price to pay for accurate fitting of local distributions, which is fundamentally a density estimation task in high dimensional system, an open problem with many feasible approaches and respective limitation [40,41].
One fundamental difference between factorization in the belief propagation algorithm and that in the molecular modeling of physical systems is that the former deals with a static graph (V,E) while the latter addresses a dynamic one (V,Edynamic). The variation of edges in molecular systems is continuous, but may be simplified as binary in the case of a protein contact map [42]. Specifically, as the distance between two atoms/particles changes over time, so the corresponding interaction potential. In the belief propagation, an isolated maximal clique has a fixed factor potential, the computation is to figure out how interactions with neighboring cliques influence the distribution of internal states of nodes. However, we assume that forces exerted on each atom/particle does not change within a time step of molecular dynamics simulations, we may similarly deem a molecular system in a sufficiently short time period as a static graph to apply belief propagation algorithms. When the exact belief propagation formulation is possible, it provides two great advantages. One is the rapid determination of the most probable configuration, and the other is the explicit normalization of local probability distributions, which is a mechanism missing in the molecular modeling methodology tool box, and is potentially helpful in keeping errors under control if successfully applied in molecular systems. To realize such a goal, the first step is to define nodes with internal states. For example, in a full quantum mechanical treatment, one may define each atom as a node with rich internal electronic states. For proteins, residues may be defined as distinct nodes with internal rotameric states, which may be approximately represented either discretely or continuously. Apparently, definition of nodes is flexible for a given type of molecular systems and requires significant effort to explore. For a given molecular system with specific definition of nodes, we may perform loopy belief propagation to search for the most probable global configuration (or a number of configurations with the largest statistical weights) of internal node states, followed by sampling of different edge configuration with LDT or some other properly selected sampling strategy. Three important factors for the application of the belief propagation in molecular systems are definition of nodes, selection/development of approximate loopy belief propagation algorithms and selection of sampling strategy. It is likely that each may impact and influenced by the other two factors. Exploring different combinations of these algorithms are likely fertile grounds for developing next generation molecular modeling methodologies. An additional advantage of formulating a molecular system with the belief propagation is that marginalization, which is of critical importance in docking and design of key spots in biomolecular interactions, becomes straight forward.
While exact belief propagation need only one full sweep (to and from each node) of message passing, LBP requires iteration, and therefore the speed of convergence becomes a practical challenge. For the first type of problems stated in the introduction (to find maximum probability configuration in particular and to map out full landscape in general), the process is essentially some form of walking over the probability landscape and settling down at the most easy-to-reach local well from a start configuration. To map out the global landscape, one has to design ways to escape from any given local well and reach other potentially deeper or shallower wells. Unfortunately, no systematic theory is available for belief propagation algorithms to achieve this goal. The rich history of enhanced sampling of free energy landscapes [11,43,44,45,46,47,48], coarse graining [49,50] (which is effectively smoothing of free energy landscape), and transition path sampling [14,15,16] (which focus on finding transition paths between/among different free energy wells) in molecular systems may provide hints to inventing new algorithms for the accelerated convergence of LBP.
Belief propagation algorithms reduces repetitive local computation in the redistribution of marginals for internal node states by surrounding connected nodes. LDT reduces the repetitive local sampling of dynamic edges by utilizing pre-trained neural networks which implicitly holds the probability density of a given configuration. Deep learning has found extensive applications in molecular simulations [51,52], especially in the construction of deep potentials (parameterization of force fields) [53,54,55,56]. It is likely that there are more potential ways of combining different factorization schemes in molecular simulations, belief propagation and deep learning to be invented. For example, when compared with other conventional dimensionality reduction algorithms, deep learning has the advantage of being much more flexible, and such dimensionality reduction might be combined with molecular simulation and belief propagation to construct novel approximate multi-scale solutions.
Matrix factorization is widely utilized in transforming input for deep learning [57,58,59]. It is important to note that such factorization is a global behavior in the probability distribution space. This is in strong contrast to decomposing a global distribution into local ones in the belief propagation and molecular simulations.
Belief propagation is a family of algorithms in a large class of factorization methodology called probabilistic graphical models with focus on static graphs. These algorithms are mainly developed by informational scientists and are applied in coding, communication and machine learning. Variants of factorization in molecular simulations, including the local distribution theory, are fundamentally factorization focusing on dynamic graphs representing complex molecular systems, and are mainly developed by computational chemists and physicists. Both are extensive research fields with very rich contents and numerous important contributions from many scientists. The most outstanding difference is that factorization in belief propagation has an exact solutions for simple and junction trees but lacks methodology development in enhanced sampling, while factorization in molecular simulations has rich history in accelerating sampling of free energy landscapes but no exact solution. Comprehensive survey of these two fields is apparently well beyond the scope of present work. This perspective is intended to emphasize this discipline gap, and intrigue interest for developing next generation algorithms that assimilates the best of both fields.
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
We appreciate the Interdisciplinary Integration and Innovation Project of JLU with grant number JLUXKJC2021ZZ05. We thank Professor Zhonghan Hu for comments on the local distribution theory when the manuscript was in preparation.
The authors declare there is no conflict of interest.
[1] | I. T. Jolliffe, Principal Component Analysis, Springer, New York, 2002. https://doi.org/10.1007/b98835 |
[2] | T. F. Cox, M. A. A. Cox, Multidimensional Scaling, 2nd eddition, Chapman and Hall/CRC, New York, 2000. https://doi.org/10.1201/9781420036121 |
[3] |
J. B. Tenenbaum, V. de Silva, J. C. Langford, A global geometric framework for nonlinear dimensionality reduction, Science, 290 (2000), 2319–2323. https://doi.org/10.1126/science.290.5500.2319 doi: 10.1126/science.290.5500.2319
![]() |
[4] |
S. T. Roweis, L. K. Saul, Nonlinear dimensionality reduction by locally linear embedding, Science, 290 (2000), 2323–2326. https://doi.org/10.1126/science.290.5500.2323 doi: 10.1126/science.290.5500.2323
![]() |
[5] | R. R. Coifman, S. Lafon, A. B. Lee, M. Maggioni, B. Nadler, F. Warner, et al., Geometric diffusions as a tool for harmonic analysis and structure definition of data: Multiscale methods. Proc. Natl. Acad. Sci., 102 (2005), 7432–7437. https://doi.org/10.1073/pnas.0500896102 |
[6] |
M. Ceriotti, G. A. Tribello, M. Parrinello, Simplifying the representation of complex free-energy landscapes using sketch-map, Proc. Natl. Acad. Sci., 108 (2011), 13023–13028. https://doi.org/10.1073/pnas.1108486108 doi: 10.1073/pnas.1108486108
![]() |
[7] | J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann, San Mateo, CA, 1988. |
[8] |
F. R. Kschischang, B. J. Frey, H. A. Loeliger, Factor graphs and the sum-product algorithm, IEEE Trans. Inf. Theory, 47 (2001), 498–519. https://doi.org/10.1109/18.910572 doi: 10.1109/18.910572
![]() |
[9] | D. Koller, N. Friedman, Probabilistic Graphical Models: Principles and Techniques, MIT Press, Cambridge, MA, 2009. |
[10] |
H. Fu, X. Shao, W. Cai, C. Chipot, Taming rugged free energy landscapes using an average force, Acc. Chem. Res., 52 (2019), 3254–3264. https://doi.org/10.1021/acs.accounts.9b00473 doi: 10.1021/acs.accounts.9b00473
![]() |
[11] |
O. Valsson, P. Tiwary, M. Parrinello, Enhancing important fluctuations: Rare events and metadynamics from a conceptual viewpoint, Annu. Rev. Phys. Chem., 67 (2016), 159–184. https://doi.org/10.1146/annurev-physchem-040215-112229 doi: 10.1146/annurev-physchem-040215-112229
![]() |
[12] |
G. Bussi, A. Laio, Using metadynamics to explore complex free-energy landscapes, Nat. Rev. Phys., 2 (2020), 200–212. https://doi.org/10.1038/s42254-020-0153-0 doi: 10.1038/s42254-020-0153-0
![]() |
[13] |
D. Ramachandram, G. W. Taylor, Deep multimodal learning: A survey on recent advances and trends, IEEE Signal Process Mag., 34 (2017), 96–108. https://doi.org/10.1109/MSP.2017.2738401 doi: 10.1109/MSP.2017.2738401
![]() |
[14] | C. Dellago, P. G. Bolhuis, P. L. Geissler, Transition path sampling, in Advances in Chemical Physics, John Wiley & Sons, Ltd, (2002), 1–78. https://doi.org/10.1002/0471231509.ch1 |
[15] |
J. Rogal, P. G. Bolhuis, Multiple state transition path sampling, J. Chem. Phys., 129 (2008), 224107. https://doi.org/10.1063/1.3029696 doi: 10.1063/1.3029696
![]() |
[16] |
P. Buijsman, P. G. Bolhuis, Transition path sampling for non-equilibrium dynamics without predefined reaction coordinates, J. Chem. Phys., 152 (2020), 044108. https://doi.org/10.1063/1.5130760 doi: 10.1063/1.5130760
![]() |
[17] | R. J. Trudeau, Introduction to Graph Theory, Dover Publications, New York, 1993. |
[18] |
S. L. Lauritzen, D. J. Spiegelhalter, Local computations with probabilities on graphical structures and their application to expert systems, J. Roy. Statist. Soc. Ser. B, 50 (1988), 157–194. https://doi.org/10.1111/j.2517-6161.1988.tb01721.x doi: 10.1111/j.2517-6161.1988.tb01721.x
![]() |
[19] | F. V. Jensen, S. L. Lauritzen, K. G. Olesen, Bayesian updating in causal probabilistic networks by local computations, Comput. Statist. Quart., 5 (1990), 269–282. |
[20] | V. Gogate, R. Dechter, A complete anytime algorithm for treewidth, in Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, Arlington, Virginia, (2004), 201–208. |
[21] | E. H. Bachoore, H. L Bodlaender, A branch and bound algorithm for exact, upper, and lower bounds on treewidth, in Algorithmic Aspects in Information and Management, AAIM 2006, Lecture Notes in Computer Science, Springer, (2006), 255–266. https://doi.org/10.1007/11775096_24 |
[22] | T. J. Ottosen, J. Vomlel, All roads lead to rome–new search methods for the optimal triangulation problem, Int. J. Approximate Reasoning, 53 (2012), 1350–1366. https://doi.org/10.1016/j.ijar.2012.06.006 |
[23] |
C. Li, M. Ueno, An extended depth-first search algorithm for optimal triangulation of bayesian networks, Int. J. Approximate Reasoning, 80 (2017), 294–312. https://doi.org/10.1016/j.ijar.2016.09.012 doi: 10.1016/j.ijar.2016.09.012
![]() |
[24] | C. Berrou, A. Glavieux, Turbo Codes, John Wiley & Sons, Ltd, New York, 2003. https://doi.org/10.1002/0471219282.eot346 |
[25] | J. Gonzalez, Y. Low, C. Guestrin, Parallel splash belief propagation, J. Mach. Learn. Res., 1 (2009), 1–48. |
[26] | J. S. Yedidia, W. T. Freeman, Y. Weiss, Generalized belief propagation, in NIPS'00: Proceedings of the 13th International Conference on Neural Information Processing System, (2000), 668–674. |
[27] | M. P. Kumar, P. H. S. Torr, Fast memory-efficient generalized belief propagation, in Computer Vision–ECCV 2006, Lecture Notes in Computer Science, Springer, (2006), 451–463. https://doi.org/10.1007/11744085_35 |
[28] | S. Y. Chen, H. Tong, Z. Wang, S. Liu, M. Li, B. Zhang, Improved generalized belief propagation for vision processing, Math. Probl. Eng., 2011 (2011). https://doi.org/10.1155/2011/416963 |
[29] | J. Ortiz, T. Evans, A. J. Davison, A visual introduction to gaussian belief propagation, arXiv preprint, (2021), arXiv: 2107.02308. https://doi.org/10.48550/arXiv.2107.02308 |
[30] |
P. Tian, The repetitive local sampling and the local distribution theory, WIREs Comput. Mol. Sci., 12 (2021), e1588. https://doi.org/10.1002/wcms.1588 doi: 10.1002/wcms.1588
![]() |
[31] |
X. Wang, S. Ramirez-Hinestrosa, J. Dobnikar, D. Frenkel, The lennard-jones potential: When (not) to use it, Phys. Chem. Chem. Phys., 22 (2020), 10624–10633. https://doi.org/10.1039/c9cp05445f doi: 10.1039/c9cp05445f
![]() |
[32] |
B. R. Brooks, C. L. Brooks, A. D. Mackerell, L. Nilsson, R. J. Petrella, B. Roux, et al., CHARMM: The biomolecular simulation program, J. Comput. Chem., 30 (2009), 1545–614. https://doi.org/10.1002/jcc.21287 doi: 10.1002/jcc.21287
![]() |
[33] |
D. A. Case, T. E. Cheatham, T. Darden, H. Gohlke, R. Luo, K. M. Merz, et al., The amber biomolecular simulation programs, J. Comput. Chem., 26 (2005), 1668–1688. https://doi.org/10.1002/jcc.20290 doi: 10.1002/jcc.20290
![]() |
[34] | D. Van Der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark, H. J. Berendsen, Gromacs: Fast, flexible, and free, J. Comput. Chem., 26 (2005), 1701–1718. https://doi.org/10.1002/jcc.20291 |
[35] |
R. H. French, V. A. Parsegian, R. Podgornik, R. F. Rajter, A. Jagota, J. Luo, et al., Long range interactions in nanoscale science, Rev. Mod. Phys., 82 (2010), 1887–1944. https://doi.org/10.1103/RevModPhys.82.1887 doi: 10.1103/RevModPhys.82.1887
![]() |
[36] |
A. Y. Toukmaji, J. A. Board, Ewald summation techniques in perspective: A survey, Comput. Phys. Commun., 95 (1996), 73–92. https://doi.org/10.1016/0010-4655(96)00016-1 doi: 10.1016/0010-4655(96)00016-1
![]() |
[37] |
C. Pan, Z. Hu, Rigorous error bounds for ewald summation of electrostatics at planar interfaces, J. Chem. Theory Comput., 10 (2014), 534–542. https://doi.org/10.1021/ct400839x doi: 10.1021/ct400839x
![]() |
[38] |
X. Cao, P. Tian, Molecular free energy optimization on a computational graph, RSC Adv., 11 (2021), 12929–12937. https://doi.org/10.1039/d1ra01455b doi: 10.1039/d1ra01455b
![]() |
[39] | X. Cao, P. Tian, "Dividing and conquering" and "caching" in molecular modeling, Int. J. Mol. Sci., 22 (2021), 5053. |
[40] |
Z. Wang, D. W. Scott, Nonparametric density estimation for high-dimensional data–algorithms and applications, WIREs Comput. Stat., 11 (2019), e1461. https://doi.org/10.1002/wics.1461 doi: 10.1002/wics.1461
![]() |
[41] |
Q. Liu, J. Xu, R. Jiang, W. H. Wong, Density estimation using deep generative neural networks, Proc. Nat. Acad. Sci., 118 (2021), e2101344118. https://doi.org/10.1073/pnas.2101344118 doi: 10.1073/pnas.2101344118
![]() |
[42] |
H. Zhang, Z. Bei, W. Xi, M. Hao, Z. Ju, K. M. Saravanan, et al., Evaluation of residue-residue contact prediction methods: From retrospective to prospective, PLoS Comput. Biol., 17 (2021), e1009027. https://doi.org/10.1371/journal.pcbi.1009027 doi: 10.1371/journal.pcbi.1009027
![]() |
[43] |
Y. Q. Gao, An integrate-over-temperature approach for enhanced sampling, J. Chem. Phys., 128 (2008), 064105. https://doi.org/10.1063/1.2825614 doi: 10.1063/1.2825614
![]() |
[44] |
L. Yang, C. W. Liu, Q. Shao, J. Zhang, Y. Q. Gao, From thermodynamics to kinetics: Enhanced sampling of rare events, Acc. Chem. Res., 48 (2015), 947–955. https://doi.org/10.1021/ar500267n doi: 10.1021/ar500267n
![]() |
[45] |
R. C. Bernardi, M. C. R. Melo, K. Schulten, Enhanced sampling techniques in molecular dynamics simulations of biological systems, Biochim. Biophys. Acta, 1850 (2015), 872–877. https://doi.org/10.1016/j.bbagen.2014.10.019 doi: 10.1016/j.bbagen.2014.10.019
![]() |
[46] |
J. Comer, J. C. Gumbart, J. Hénin, T. Lelièvre, A. Pohorille, C. Chipot, The adaptive biasing force method: everything you always wanted to know but were afraid to ask, J. Phy. Chem. B, 119 (2015), 1129–1151. https://doi.org/10.1021/jp506633n doi: 10.1021/jp506633n
![]() |
[47] |
V. Mlynsky, G. Bussi, Exploring RNA structure and dynamics through enhanced sampling simulations, Curr. Opin. Struct. Biol., 49 (2018), 63–71. https://doi.org/10.1016/j.sbi.2018.01.004 doi: 10.1016/j.sbi.2018.01.004
![]() |
[48] |
Y. I. Yang, Q. Shao, J. Zhang, L. Yang, Y. Q. Gao, Enhanced sampling in molecular dynamics, J. Chem. Phys., 151 (2019), 070902. https://doi.org/10.1063/1.5109531 doi: 10.1063/1.5109531
![]() |
[49] |
W. Tschöp, K. Kremer, J. Batoulis, T. Bürger, O. Hahn, Simulation of polymer melts. I. Coarse-graining procedure for polycarbonates, Acta Polym., 49 (1998), 61–74. https://doi.org/10.1002/(sici)1521-4044(199802)49:2/3<61::Aid-apol61>3.0.Co;2-v doi: 10.1002/(sici)1521-4044(199802)49:2/3<61::Aid-apol61>3.0.Co;2-v
![]() |
[50] |
H. Chan, M. J. Cherukara, B. Narayanan, T. D. Loeffler, C. Benmore, S. K. Gray, et al., Machine learning coarse grained models for water, Nat. Commun., 10 (2019), 379. https://doi.org/10.1038/s41467-018-08222-6 doi: 10.1038/s41467-018-08222-6
![]() |
[51] |
F. Noe, A. Tkatchenko, K. R. Muller, C. Clementi, Machine learning for molecular simulation, Annu. Rev. Phys. Chem., 71 (2020), 361–390. https://doi.org/10.1146/annurev-physchem-042018-052331 doi: 10.1146/annurev-physchem-042018-052331
![]() |
[52] |
P. Gkeka, G. Stoltz, A. B. Farimani, Z. Belkacemi, M. Ceriotti, J. D. Chodera, et al., Machine learning force fields and coarse-grained variables in molecular dynamics: Application to materials and biological systems, J. Chem. Theory Comput., 16 (2020), 4757–4775. https://doi.org/10.1021/acs.jctc.0c00355 doi: 10.1021/acs.jctc.0c00355
![]() |
[53] |
J. Behler, Perspective: Machine learning potentials for atomistic simulations, J. Chem. Phys., 145 (2016), 170901. https://doi.org/10.1063/1.4966192 doi: 10.1063/1.4966192
![]() |
[54] |
M. Ceriotti. Unsupervised machine learning in atomistic simulations, between predictions and understanding, J. Chem. Phys, 150 (2019), 150901. https://doi.org/10.1063/1.5091842 doi: 10.1063/1.5091842
![]() |
[55] |
A. Lunghi, S. Sanvito, A unified picture of the covalent bond within quantum-accurate force fields: From organic molecules to metallic complexes' reactivity, Sci. Adv., 5 (2019), eaaw2210. https://doi.org/10.1126/sciadv.aaw2210 doi: 10.1126/sciadv.aaw2210
![]() |
[56] |
T. Mueller, A. Hernandez, C. Wang, Machine learning for interatomic potential models, J. Chem. Phys., 152 (2020), 050902. https://doi.org/10.1063/1.5126336 doi: 10.1063/1.5126336
![]() |
[57] |
Z. Huang, Y. Wang, X. Ma, Clustering of cancer attributed networks by dynamically and jointly factorizing multi-layer graphs, IEEE/ACM Trans. Comput. Biol. Bioinf., 19 (2022), 2737–2748. https://doi.org/10.1109/TCBB.2021.3090586 doi: 10.1109/TCBB.2021.3090586
![]() |
[58] |
X. Gao, X. Ma, W. Zhang, J. Huang, H. Li, Y. Li, et al., Multi-view clustering with self-representation and structural constraint, IEEE Trans. Big Data, 8 (2022), 882–893. https://doi.org/10.1109/tbdata.2021.3128906 doi: 10.1109/tbdata.2021.3128906
![]() |
[59] |
W. Wu, X. Ma, Network-based structural learning nonnegative matrix factorization algorithm for clustering of scrna-seq data, IEEE/ACM Trans. Comput. Biol. Bioinf., 20 (2023), 566–575. https://doi.org/10.1109/TCBB.2022.3161131 doi: 10.1109/TCBB.2022.3161131
![]() |