Research article Special Issues

Identifying transition states of chemical kinetic systems using network embedding techniques

  • Many chemical and biochemical systems can be intuitively modeled using networks. Due to the size and complexity of many biochemical networks, we require tools for efficient network analysis. Of particular interest are techniques that embed network vertices into vector spaces while preserving important properties of the original graph. In this article, we {introduce a new method for generating low-dimensional node embeddings for directed graphs, using random walk sampling methods for feature learning on networks. Additionally, we demonstrate the usefulness of this method for identifying transition states of stochastic chemical reacting systems.} Network representations of chemical systems are typically given by weighted directed graphs, and are often complex and high dimensional. In order to deal with networks representing these chemical systems, therefore, we modified objective functions adopted in existing random walk based network embedding methods to handle directed graphs and neighbors of different degrees. Through optimization via gradient ascent, we embed the weighted graph vertices into a low-dimensional vector space Rd while preserving the neighborhood of each node. These embeddings may then be used to detect relationships between nodes and study the structure of the original network. We then demonstrate the effectiveness of our method on dimension reduction through several examples regarding identification of transition states of chemical reactions, especially for entropic systems.

    Citation: Paula Mercurio, Di Liu. Identifying transition states of chemical kinetic systems using network embedding techniques[J]. Mathematical Biosciences and Engineering, 2021, 18(1): 868-887. doi: 10.3934/mbe.2021046

    Related Papers:

    [1] Saranya Muniyappan, Arockia Xavier Annie Rayan, Geetha Thekkumpurath Varrieth . DTiGNN: Learning drug-target embedding from a heterogeneous biological network based on a two-level attention-based graph neural network. Mathematical Biosciences and Engineering, 2023, 20(5): 9530-9571. doi: 10.3934/mbe.2023419
    [2] Shi Liu, Kaiyang Li, Yaoying Wang, Tianyou Zhu, Jiwei Li, Zhenyu Chen . Knowledge graph embedding by fusing multimodal content via cross-modal learning. Mathematical Biosciences and Engineering, 2023, 20(8): 14180-14200. doi: 10.3934/mbe.2023634
    [3] Manfu Ma, Penghui Sun, Yong Li, Weilong Huo . Predicting the risk of mortality in ICU patients based on dynamic graph attention network of patient similarity. Mathematical Biosciences and Engineering, 2023, 20(8): 15326-15344. doi: 10.3934/mbe.2023685
    [4] Jia Yu, Huiling Peng, Guoqiang Wang, Nianfeng Shi . A topical VAEGAN-IHMM approach for automatic story segmentation. Mathematical Biosciences and Engineering, 2024, 21(7): 6608-6630. doi: 10.3934/mbe.2024289
    [5] Peng Wang, Shiyi Zou, Jiajun Liu, Wenjun Ke . Matching biomedical ontologies with GCN-based feature propagation. Mathematical Biosciences and Engineering, 2022, 19(8): 8479-8504. doi: 10.3934/mbe.2022394
    [6] Hongmei Jin, Ning He, Boyu Liu, Zhanli Li . Research on gesture recognition algorithm based on MME-P3D. Mathematical Biosciences and Engineering, 2024, 21(3): 3594-3617. doi: 10.3934/mbe.2024158
    [7] Taigang Liu, Chen Song, Chunhua Wang . NCSP-PLM: An ensemble learning framework for predicting non-classical secreted proteins based on protein language models and deep learning. Mathematical Biosciences and Engineering, 2024, 21(1): 1472-1488. doi: 10.3934/mbe.2024063
    [8] Xin Zhou, Jingnan Guo, Liling Jiang, Bo Ning, Yanhao Wang . A lightweight CNN-based knowledge graph embedding model with channel attention for link prediction. Mathematical Biosciences and Engineering, 2023, 20(6): 9607-9624. doi: 10.3934/mbe.2023421
    [9] Zhiwen Hou, Fanliang Bu, Yuchen Zhou, Lingbin Bu, Qiming Ma, Yifan Wang, Hanming Zhai, Zhuxuan Han . DyCARS: A dynamic context-aware recommendation system. Mathematical Biosciences and Engineering, 2024, 21(3): 3563-3593. doi: 10.3934/mbe.2024157
    [10] Karim El Laithy, Martin Bogdan . Synaptic energy drives the information processing mechanisms in spiking neural networks. Mathematical Biosciences and Engineering, 2014, 11(2): 233-256. doi: 10.3934/mbe.2014.11.233
  • Many chemical and biochemical systems can be intuitively modeled using networks. Due to the size and complexity of many biochemical networks, we require tools for efficient network analysis. Of particular interest are techniques that embed network vertices into vector spaces while preserving important properties of the original graph. In this article, we {introduce a new method for generating low-dimensional node embeddings for directed graphs, using random walk sampling methods for feature learning on networks. Additionally, we demonstrate the usefulness of this method for identifying transition states of stochastic chemical reacting systems.} Network representations of chemical systems are typically given by weighted directed graphs, and are often complex and high dimensional. In order to deal with networks representing these chemical systems, therefore, we modified objective functions adopted in existing random walk based network embedding methods to handle directed graphs and neighbors of different degrees. Through optimization via gradient ascent, we embed the weighted graph vertices into a low-dimensional vector space Rd while preserving the neighborhood of each node. These embeddings may then be used to detect relationships between nodes and study the structure of the original network. We then demonstrate the effectiveness of our method on dimension reduction through several examples regarding identification of transition states of chemical reactions, especially for entropic systems.


    Networks provide a natural framework for many chemical and biochemical systems, including chemical kinetics and dynamical interactions between biomolecules. The possible stages of a reaction, for example, can be viewed as vertices in a directed graph. Frequently, real-world networks are high-dimensional and complex in structure, which leads to difficulties in interpretation and analysis. In order to take full advantage of the structural information contained in such networks, efficient and robust methods of effective network analysis are essential to produce low dimensional representations while retaining key information about the nodes of the graph.

    Network analysis is especially useful for understanding behaviors of nonlinear chemical processes such as biochemical switches. Biochemical switches are systems of chemical reactions with multiple lower-energy steady states, or metastable states. Without stochastic perturbations, the system will spend all its time at the metastable states, where energy is minimized, while the presence of stochastic noise in the dynamics can lead to transitions between metastable states. Difficulties arise in the simulation of such systems as a result of the multiple time scales at work; transitions between steady states are typically rare, but occur rapidly.

    The different possible states of the reaction, given by the varying concentrations of reactant and product species, can be viewed as the vertices of a weighted directed graph. For example, we may have an 'initial' state where only reactant species exist, a 'final' state where only product species remain, and any number of transitional states each representing a particular concentration of the different species. The edge weights of the graph can then be determined from the likelihoods of transitioning between any given pair of states. The progression of the reaction over time can then be viewed as a Markov jump process, where the jump probabilities are represented by the weights on the edges between each pair of nodes. Then, methods for network analysis can be used to identify interactions and pathways between the components of the biochemical switch. One objective of this paper is to suggest a method to analyze this Markov process using node embedding techniques. This paper will focus on the AB problem: A and B are taken as the reactant and product states, respectively. The aim of this paper is to investigate the mechanism by which this transition occurs.

    Recently, methods for analyzing networks through feature representation and graph embedding have received increasing attention. For an overview on this subject, a number of recent review papers are available [1,2]. While much of this research has been focused on undirected graphs, the directed case has also been investigated, for example [3,4,5]. In particular, several methods have been proposed that utilize random walks for node embedding.

    Node embedding (or feature learning) methods aim to represent each node of a given network as a vector in a low-dimensional continuous vector space, in such a way that information about the nodes and relationships between them are retained. Specifically, nodes that are similar to each other according to some measure in the original graph will also have similar representations in the embedding space. Usually, this means that the embedding will be chosen in a way that maximizes the inner products of embedding vectors for embedded nodes that are similar to each other. Compared with the original complex high-dimensional networks, these low dimensional continuous node representations have the benefit of being convenient to work with for downstream machine learning techniques.

    Compared with alternative methods that embed edges or entire graphs instead of nodes, node embeddings are more adaptive for numerous tasks, including node classification, clustering, and link prediction [2]. Node classification is a process by which class labels are assigned to nodes based on a small sample of labeled nodes, where similar nodes have the same labels. Clustering algorithms group the representations of similar nodes together in the target vector space. Link prediction seeks to predict missing edges in an incomplete graph.

    The recent interest in feature learning has led to a collection of different node embedding methods, which can be broadly classified [1] by the analytical and computational techniques employed such as matrix factorization, random walk sampling, and deep learning network. The matrix factorization based methods generate embeddings by first using a matrix to represent the relationships between nodes, which include the Laplacian matrix as in the Laplacian eigenmaps [6], the adjacency matrix as in locally linear embedding (LLE) [7] or graph factorization (GF) [8], etc. Though the above methods focus on the case of undirected methods, similar techniques for the directed case have also been suggested [3], where the development of the Laplacian matrix, a combinatorial Laplacian matrix, and Cheeger inequality allow the above approaches to be extended for directed graphs. Factorization based methods such as eigenvalue decomposition can apply for certain matrices (e.g., positive semidefinite), and additional difficulties may arise with scaling for large data sets.

    Random walk methods are often used for large graphs, or graphs that cannot be observed in their entirety. The general idea involves first simulating random walks on a network, then using the output to infer relationships between nodes. Random walk methods can be used to study either undirected or directed graphs, although much of the previous work has focused on the undirected case. For example, DeepWalk [9] uses short unbiased random walks to find similarities between nodes, with node pairs that tend to co-occur in the same random walks having higher similarities. This method performs well for detecting homophily – similarity based on node adjacency and shared neighbors. Similar methods, such as node2vec [10], use biased random walks to capture similarity based on both homophily and structural equivalence – similarities in the nodes' structural roles in the graph. Both of these methods specifically address undirected graphs. A similar approach designed for the directed case is proposed in [4], which uses Markov random walks with transition probabilities given by ratios of edge weights to nodes' out degrees, together with the stationary distribution of the walks, to measure local relationships between nodes.

    Theoretical frameworks for the study of transition events of biochemical processes include Transition State Theory (TST), Transition Path Sampling (TPS), Transition Path Theory (TPT) and Forward Flux Sampling (FFS). The main idea of TST is that in order for the system to move from the reactant state to the product state, the system must pass through a saddle point on the potential energy surface, known as a Transition State [11]. The TPS technique uses Monte Carlo sampling of transition paths to study the full collection of transition paths of a given Markov process [12]. The basic idea, similar to importance sampling, is to perform a random walk in trajectory space, biased so that important regions are visited more often. Transition Path Theory (TPT) studies reactive trajectories of a system by considering statistical properties such as rate functions and probability currents, which measure the average rates of transitions between states [13]. Forward flux sampling [14], designed specifically to capture rare switching events in biochemical networks, uses a series of interfaces between the initial and final states to calculate rate constants and generate transition paths. This method has the advantage that knowledge of the phase space density is not required. For the purpose of this paper, we will be using TPT, but other such techniques can fit into the method presented here as well.

    The contributions of this paper are a method of using a random walk network embedding approach for node classification and clustering on directed graphs, as well as identification of transition states for the specific case of entropy effects. The method presented reduces the dimension of networks, thereby allowing analysis and interpretation of the system.

    The remainder of this paper will be organized as follows. First, in Section 2 we will introduce some background regarding feature learning on networks, and provide a brief overview of some basic principles from TPT. Then Section 3 will introduce a method for node classification for directed graphs, based on random walk node embedding. Finally, in Section 4, we will study several examples, showing the effectiveness of the method on identifying transition states of entropic systems.

    In this paper, we focus on the mechanism by which a system such as a biochemical switch passes through its energy landscape following a reactive trajectory. A trajectory is said to be reactive if it leaves the reactant state A and later arrives at the product state B without first returning to state A. Such a trajectory can be viewed as a sequence of transitions between metastable states where the energy is locally minimized. In particular, we treat these sequences as Markov jump processes, and study the probability space of these reactive trajectories in order to identify and understand the transition events of the system. Though this paper will be utilizing the techniques of Transition Path Theory for this purpose, it is also possible to adopt other frameworks, such as Transition Path sampling, Forward Flux sampling, etc., to investigate the transition events of a system.

    Consider a Markov jump process on a countable state space S. Let L represent the infinitesimal generator of the process, as defined in [18]. In other words, L is an |S|×|S| matrix with entries Luv such that for u,vS,uv, LuvΔt+o(Δt) represents the probability that a process that is in state u at time t will jump to state v during the time interval [t,t+Δt]. Then the entries of L are transition rates, satisfying

    {Luv0 u,vS, uv,vSLuv=0, uS. (2.1)

    Let {X(t)}tR represent an equilibrium sample trajectory of the Markov process, such that {X(t)}tR is right-continuous with left limits. At time t, let the probability distribution of this process be denoted by μ(t):=(P(X(t)=u))TuS. Then the time evolution of μ(t) follows the forward kolmogorov equation (or master equation)

    dμdt=μTL,t0.

    Denote the time-reversed process by {˜X(t)}tR, and the infinitesimal generator of this process by ˜L. The stationary probability distribution π=(πv)vS of both {X(t)}tR and {˜X(t)}tR is given by the solution to

    0=πTL .

    Networks can have large numbers of vertices and complex structures, which can lead to challenges in network analysis. To overcome these difficulties and improve understanding of these networks, the nodes of a given network can be embedded into a low-dimensional vector space, a process called feature learning. The techniques used in this paper to investigate transition states and pathways combine neural network-based and random walk-based methods for learning latent representations of a network's nodes, as discussed in [4,9,10]. The idea behind these methods is that the nodes in a given network G=G(V,E) can be embedded into a low-dimensional vector space in such a way that nodes similar to each other according to some measure in the original graph will also have embeddings similar to each other in the embedding space.

    Feature learning, or node embedding, requires three main components: an encoder function, a definition of node similarity and a loss function. The encoder function is the function that maps each node to a vector in the embedding space. To find the optimal embeddings, the encoder function can be chosen by minimizing the loss function so that the similarity between nodes in the original network corresponds as closely as possible to the similarity between their vector representations in the continuous space. Node similarity can have different interpretations, depending on the network and the task to be accomplished. For example, nodes that are connected, share neighbors, or share a common structural role in the original graph can be claimed to be similar, and to be embedded in the vector space by representations of close similarity. For the embedded vectors, a common approach is to use inner products as the similarity measure in the feature space–that is, if u and v are two nodes in a graph with representations e(u) and e(v) in a low-dimensional vector space, then e(u)Te(v) should approximate the similarity between representations e(u) and e(v).

    One technique for finding node embeddings is through the use of random walks. This technique has been applied to undirected graphs in methods such as DeepWalk and node2vec [9,10]. In these methods, the general approach is to use short random walks to determine similarities for each pair of nodes. The notion of node similarity is co-occurence within a random walk; two nodes are similar if there is a higher probability that a walk containing one node will also contain the other. The various methods using this approach differ in the implementation of this general idea: DeepWalk, for example, uses straightforward unbiased walks, while node2vec uses adjustable parameters to bias random walks toward breadth-first or depth-first sampling for a more flexible notion of similarity.

    The first stage in the feature learning process is to simulate a certain number of random walks starting from each vertex. Using the results of these random walks, the neighbors of each node can be determined. The objective is to maximize the probability of observing the neighborhood of each node, conditioned on the vector representation of the node. In several of the random walk-based feature representation methods for undirected graphs, this is achieved through optimization of a Loss function of the form:

    maxeJ(e) = uVlogPr(NS(u)|e(u)), (2.2)

    where V is the set of all vertices in the graph, NS(u) is the neighborhood of the node u, and e(u) is the encoder function representing the embedded vector. If we assume that the probabilities of observing a particular neighbor are independent, we can have

    Pr(NS(u)|e(u))=niNS(u)Pr(ni|e(u)). (2.3)

    Since these methods typically aim to maximize the inner products for neighboring nodes, they often utilize a softmax function to model each conditional probability:

    Pr(ni|e(u))=exp(e(ni)e(u))vVexp(e(v)e(u)) .

    For networks where |V| is large, techniques such as hierarchical softmax [9] or negative sampling [16] can be used to increase efficiency. To determine the optimal embedding function e(u), the Loss function can be optimized using gradient descent with respect to the parameters of e(u).

    In [4], a random walk-based method specifically intended for directed graphs is proposed. The function to be optimized in this method, when embedding into R1, is

    I=uπuv,uvp(u,v)(yuyv)2,

    where yu is the one-dimensional embedding of u, πu is the stationary probability for the node u and p(u,v) represents the random walk's transition probability from u to v. The transition probabilities are calculated according to

    p(u,v)=w(u,v)/k,ukw(u,k),

    where w(u,v) is the weight on the edge (u,v). Note that this method, though it uses transition probabilities of a random walk, does not actually require simulation of random walks, since all the summations can be done deterministically.

    According to the spectral graph theory [3], a circulation on a directed graph is a function F such that, for each uV, u,uvF(u,v)=w,vwF(v,w). The Laplacian of a directed graph is therefore defined as

    L=IΦ1/2PΦ1/2+Φ1/2PΦ1/22,

    where Φ is a diagonal matrix with diagonal entries given by Φv,v=ϕ(v)=u,uvF(u,v) for some circulation function F. The following definition of the combinatorial Laplacian is also due to Chung [3]:

    L=ΦΦP+PTΦ2,

    where P is the transition matrix, i.e., its elements are given by Puv=p(u,v), and Φ is the diagonal matrix of the stationary distribution, i.e., Φ=diag(π1,,πn). Note that the combinatorial Laplacian is symmetric and semi-positive definite. It is shown in [4] that

    uπuv,uvp(u,v)(yuyv)2=2yTLy

    where y=(y1,y2,,yn)T. This result demonstrates that this method is analogous to the Laplacian-based methods for undirected graphs, e.g., Laplacian eigenmaps [1].

    Random walk methods have been shown to perform well compared to other methods, and can be useful for a variety of different notions of node similarity. Previous work has shown them to be robust, efficient, and capable of completing a diverse range of tasks, including node classification, link prediction, clustering, and more [1,2,9,10]. Because they do not examine the entire graph at once, these methods are also useful for very large networks and networks that cannot be observed in their entirety.

    A third class of graph embedding methods involves the use of deep learning techniques, including neural networks [1,2], e.g., Graph Neural Network (GNN) [2,17]. Such methods have been shown to perform well, particularly for dimension reduction and identifying nonlinear relationships among data. Random walk based methods, such as DeepWalk, can incorporate deep learning algorithms like SkipGram for graph embedding, where random walks serve to provide neighborhood information as input.

    In general, neural network methods typically work by assigning a weight to each node representation, and then combining those terms via a transfer function. This result is then used as the input for an activation function, such as a sigmoid function, i.e., σ(x)=1/(1+exp(x)). This process may then repeat for multiple layers. In a neural network method, the parameters to be optimized are the weights, which are updated after each layer, typically by gradient descent. In the GNN [17], the objective function to be minimized is of the form

    W=ij(ti,jφw(Gi,ni,j))2,

    where Gi is the learning set, the ni,j and ti,j are the nodes and target outputs, and the goal is to choose the parameter w in such a way that φ closely approximates the target outputs. This optimization occurs through gradient descent, where the gradient is computed with respect to the weights w.

    In order to identify the transition paths of a given Markov process, we will adopt the framework of Transition Path Theory (TPT). The following notations are mostly from [18]. Consider a state space S, for an initial state AS and final state BS, a trajectory X(t) is said to be reactive if it begins from state A and reaches state B before returning to state A.

    To determine whether a given reaction path is reactive, we will need the following forward and backward committor functions. For each iS, the forward committor q+i is the probability that a process initially at state i will reach state B before it reaches state A. Similarly, the discrete backward committor qi denotes the probability that a process arriving at state i was more recently in state A rather than in state B. The forward committors satisfy the following Dirichlet equation

    jSLijq+j=0i(AB)Cq+i=0iAq+i=1iB,

    and similar equations are satisfied by backward committors.

    The probability current, or flux, of reactive trajectories gives the average rate at which reactive paths transition from state to state. For trajectories from A to B, the probability current is defined for all ij such that

    fABij=lims0+1slimT12TTT1{i}(X(t))1{j}(X(t+s))×nZ1(,tBn](t)1[tAn,)(t+s)dt=fABij,

    where 1(a,b) denotes the indicator function on the interval (a,b). Also, for all iS, fABii=0. It can be shown that

    fABij={πiqiLijq+j,if ij0if i=j.

    Since we are primarily interested in the flow from A to B, and the process can move in either direction between two adjacent nodes on the path, the following effective current accounts for the net average number of reactive paths that jump from i to j per time unit:

    f+ij=max(fABijfABji,0).

    We can also define the total effective current for a node i:

    C+i=j:(i,j)Ef+ij.

    In [21], the effective current is generalized to a connected subnetwork of nodes Ω such that

    C+(Ω)=iΩC+i,

    which leads to the following definition of transition states of a time reversible process as subsets of the state space:

    TS=limσ0argmaxΩ{C+(Ω)exp((q+0.5)2/σ2)}

    For the non-time reversible case, transition states can be identified similarly, replacing the exponential argument ((q+0.5)2/σ2) with ((q+0.5)2+(q0.5)2)/σ2. The above definition gives the flexibility of identifying transition states as subsets of the state space, which will be very useful when dealing with complex dynamics such as entropy effects.

    We now introduce a method to identify transition states and paths of Markov processes, by random walk based network embedding techniques for directed networks. Consider a Markov jump process in a countable state space S. Using the infinitesimal generator L for the process, we can calculate the stationary probability distribution π and the forward and backward committors, based on which probability current and effective currents fABij and f+ij can then be computed. Once these quantities have been obtained, the discretized state space can be viewed as a weighted, directed graph G(V,E). The nodes vV are the grid points representing states of the system, and each pair of adjacent nodes are connected by an edge (u,v)E with weights given by the effective probability current, such that the direction of the edge will be determined by the sign of the effective current.

    Once G is constructed, we can apply feature learning techniques for node classification to identify transition states. Random walk trials for similarities between nodes will start from each node in the space, using the edge weights as transition probabilities. The outputs of the random walks can then be used to compute experimental "neighbor probabilities", i.e., the probability that a random walk of length d starting at node u will contain node v. If the neighbor probability for a node pair (u,v) is sufficiently high, then the node v is claimed as a neighbor of u. Note that this is a directed process; v can be a neighbor of u even if u is not a neighbor of v.

    After finding these conditional probabilities and identifying the neighborhoods of each node, the next step is to maximize an objective function in order to solve for the node embeddings. As in the node2vec and DeepWalk methods for the undirected case, the conditional probabilities are modeled using softmax functions. In order to adapt this technique for the case of directed graphs, we will use a modified version of the objective function:

    V[e]=uπuvN(u)Pr(v|e(u)). (3.1)

    Optimizing this function will give the encoder function eopt that maximizes the probability of the neighborhood of each node, with the most relevant nodes having the most influence on the feature representations.

    In the original form of the objective function (3.1) for the undirected graph methods described above, the softmax functions depend only on whether nodes are neighbors, and is independent of the probability (frequency) that a particular neighbor will show up in a random walk trial. In other words, all neighbors are treated equally, while in reality some node pairs are more likely to co-occur than others. To address this, the objective function used here incorporates the neighboring probabilities, denoted NP(,) such that

    Pr(n|e(u))=exp(e(n)e(u))NP(n,u)vVexp(e(v)e(u))NP(v,u).
    Algorithm 1 A random walk simulation method for directed graphs, given the transition matrix t.
    for uV do
      walk(1)=u
      for step=1:walklength do
       for k=2:d do
        total=v,walk(k1)vt(walk(k1),v)
        for v:walk(k1)v do
         P(walk(k1),v)=t(walk(k1),v)/total
        end for
        Use probability P(walk(k1),) to choose walk(k)=v for some neighbor of walk(k1).
        Set counter(walk(k),u)=counter(walk(k),u)+1.
        end for
       end for
      end for
    for u,vV do
     neighborprobability(u,v)=counter(u,v)/kcounter(u,k)
    end for

     | Show Table
    DownLoad: CSV

    Assuming e(u)=E2u, for E2 being a 2×d matrix, the objective defined above can then be maximized using gradient ascent to obtain the matrix E2opt and therefore the feature embeddings. In practice, the linear assumption on the map from nodes to embeddings appears to be insufficiently flexible, particularly for higher-dimensional problems. To improve upon this, a neural network can be incorporated into the feature learning process, with the sigmoid function σ(u)=1/(1+exp(e(u))) where e(u) is the representation in the embedding space of the node u. The current algorithm uses a relatively small number of layers; raising this number results in node representations that are clustered more closely to their neighbors.

    To obtain transition states for a particular process, we will examine the similarities between the node representations of each state and the representation of metastable state A. Here, by "similarities" we mean the conditional probabilities given by the softmax units discussed previously. By construction, these values will naturally be small for nodes that are further from node A, due to hopping distances that are longer than the length of the random walks. So, to find the transition states, we combine similarities of node pairs with shorter hopping distances via matrix propagation, e.g.,

    sim(A,u)=Av,vusim(A,v)sim(v,u). (3.2)

    Note that a node u for which this value is high is a higher-order neighbor of the reactant node A; therefore, such a node will have a higher probability of lying along a transition path. We then define transition states as nodes with high probability of lying on a transition path, which are not direct neighbors of the reactant or product states.

    Next, we will illustrate the suggested method on several simple examples. In the examples presented here, we have used 100 random walk trials, each of length 9 steps. Smaller numbers of trials increase the impact of noise in the results. Due to the matrix propagation technique described above, changes to the walk length parameter do not have a noticeable affect.

    First we will consider a two-dimensional diffusion process (x(t),y(t)) representing the position of a particle traversing a potential field with potential V. This process follows the SDE

    dxdt=dVdx+dW1(t)dt,dydt=dVdy+dW2(t)dt,

    where V is the potential

    V=(x21)2+εy4, (4.1)

    and W1(t) and W2(t) are independent Brownian motion processes.

    This potential V has two local minima at (-1, 0) and (1, 0), as well as a saddle point at the origin. As a result, the diffusion process (x(t),y(t)) will have metastable states at these energy minima. Here I will take (1, 0) to be the reactant state A, and (1, 0) to be the product state B. The diffusion process can be approximated using a Markov (birth-death) jump process on a discrete state space. For this example, we will study the diffusion process on the domain Ω=[1,1]×[0.75,0.75] by examining the Markov jump process on a grid D=((1+hZ)×(0.75+hZ))Ω, with h=0.05.

    In order to examine the behavior of this Markov process, we should first compute the probability currents for the process. The infinitesimal transition matrix L can be found by using the jump rates for the birth-death process Luv for each pair of adjacent nodes u,v [19]. We define the following constants for a state (x,y) with x(1,1):

    k+x(x,y)=12h212hdVdx(xh),kx(x,y)=12h2+12hdVdx(x+h),k+y(x,y)=12h212hdVdy(yh),ky(x,y)=12h2+12hdVdy(y+h).

    Also let k+x(x+h,y)=1/h if x=1,  k+x(x+h,y)=0 if x=1, and kx(xh,y)=0, kx(xh,y)=1/h if x=1 or x=1 respectively, and similarly for k+y and ky. Then the infinitesimal generator can be expressed in terms of its action on a test function f such that

    (Lf)(x,y)=k+x(x+h,y)(f(x+h,y)f(x,y))+kx(xh,y)(f(xh,y)f(x,y))+k+y(x,y+h)(f(x,y+h)f(x,y))+ky(x,yh)(f(x,yh)f(x,y)).

    Figure 1 assigns each point in the grid a color based on its similarity to the node of A=(1,0), for epsilon values ε=.01 and ε=1 in Eq (4.1), respectively. To account for the fact that nodes with greater hopping distances from A will have lower similarities to this node, we propagate the similarity matrix according to (3.2) in order to assign similarities to more distant nodes. The red area passing from A to B represents the region that reactive trajectories will pass through with the highest probability for each value of epsilon.

    Figure 1.  Diffusion process with energy barrier: plot assigning colors to each node in the discretized domain Ω, according to each node's similarity to the starting node A. Higher similarity values correspond to higher dot products between the two-dimensional embeddings of the nodes. The left plot corresponds to a parameter value of ε=0.01, while the right plot corresponds to ε=1.

    In Figure 2, the node embeddings (u,v)=eopt(x,y) corresponding to a subset of nodes with higher similarity values to A are shown, for the case where ε=0.01, where eopt is the optimal linear encoding. Notice that in this figure, the embeddings for nodes with the highest probability of occurring in a reactive trajectory are clustered together, while nodes with lower probabilities tend to be grouped with other nodes with similar probabilities; more specifically, nodes with similarity values in the range 0.50.6 have embeddings clustered near the upper left or lower right of the figure, while nodes with values from 0.70.8 will have embeddings which appear in one of the two orange clusters.

    Figure 2.  Feature representations of nodes with high similarity to the reactant state A for the 2D diffusion process with energy barrier, with ε=0.01.

    In the context of the diffusion process with the potential V, the entropy effect refers to changes in the system's observed dynamics in response to the change of the parameter ε. That is, as ε is decreased, dV/dy shrinks and therefore the negative term of dy/dt becomes small. As a result, the shape and size of the saddle point located at the origin is altered, and we can expect the y-coordinate to take on a greater variety of values with higher probabilities in this case. In particular, around the origin, smaller values of ε should result in higher similarities between the nodes above and below the origin, since in this case the process is more likely to move up and down. For larger values of ε, the process is expected to move straight forward from A to B, with a smaller probability of moving up or down.

    Next we consider a pure diffusion process (V(x,y)=0) on the domain Ω=[1,1]×[1,1] with two barriers as depicted in Figure 3. This example was also discussed in [20]. Here we take the node at (0.6,0.6) to be the initial state A, and (0.6,0.6) as the final state B. The particle spends most of its time between the reactant and product states, in the region between the barriers ([1,1]×[0.4,0.4]). The only factor affecting the probability that the particle will reach B before returning to A is its current distance to B, since there are no energetic obstacles to overcome. In order to travel from A to B, the process must find its way past the obstacles by chance, thereby overcoming an entropic barrier.

    Figure 3.  Diffusion process with entropic bottleneck: surface plot of the similarity function between the reactant state A and nodes in Ω. Here, the discretization with h=0.1 is used.

    Figure 4 shows the node embeddings for a subset of nodes with high similarity values to reactant A. Note that the color scheme used in this figure corresponds to the color scheme in Figure 3; node embeddings that are colored red in the scatter plot correspond to red areas in the surface plot. In Figure 4, node embeddings again tend to be grouped by the corresponding values of the similarity function: nodes that are very similar to the initial state A (similarity function value >0.9) appear in the red cluster toward the lower right. Other clusters of different colors correspond to groups of nodes with lower probabilities of appearing in a reactive trajectory. The yellow cluster contains embeddings of nodes with similarity function values approximately 0.70.8, while the blue cluster corresponds to nodes with similarity values near 0.50.6.

    Figure 4.  Feature representations of nodes with high similarity to the reactant state A for the entropic diffusion process.

    Now we will consider a higher-dimension example, a stochastic model of a 3D genetic toggle switch consisting of three genes that each inhibit the others' expression [21]. We consider the production and degradation of the three gene products, S1,S2, and S3:

    α1α4S1,α2α5S2,α3α6S3,

    where the parameters αi are defined:

    α1=c11(65+x22)(65+x23) ,α2=c12(65+x21)(65+x23) ,α3=c13(65+x21)(65+x22) ,α4=c4x1 ,α5=c5x2,α6=c6x3,

    with c11=2112.5, c12=845, c13=4225, c4=0.0125, c5=0.005, and c6=0.025. From the stationary probability distribution, it can be seen that this model has three metastable states:

    A={xS|35x145,0x24,0x34},B={xS|35x245,0x14,0x34},C={xS|35x345,0x24,0x14}.

    Here we will choose A to be the reactant state and B to be the product state.

    Figure 5 assigns colors to each node in Ω={ih,jh,kh|i,j,kZ}[0,45]×[0,45]×[0,45] based on their similarity to nodes in A, propagating the similarity matrix for nodes with larger hopping distance from A as described previously. This figure shows two potential transition states for this system. A reactive trajectory from A to B will be most likely to pass directly from A to B along the higher-probability trajectory, represented in red. However, such a trajectory may instead reach B after travelling through state C, as indicated by the fainter region connecting A and B via C.

    Figure 5.  Surface plot of the similarities to A of grid points in the domain Ω for the 3-D Toggle switch model. Results are for a 15×15×15 discretization (h=3).

    Below, Figure 6 shows the two-dimensional node embeddings in R2, with the embeddings of nodes with zero similarity to A omitted for clarity. There are several distinct clusters present in Figure 6, each representing a cluster of similar nodes. The nodes with the highest similarities to A after propagation of the similarity matrix (represented in red) are shown in the cluster to the lower left of the figure. Hence, the nodes in this cluster represent the nodes expected to appear in the transition path.

    Figure 6.  Feature representations of nodes with high similarity to the reactant state A for the 3D toggle switch.

    Now we will consider a higher-dimension example, a 3D stochastic model for virus propagation consisting of three species which are necessary for virus production [22]. The three species involved in this system are the template (tem), viral genome (gen), and structural (struct) proteins of the virus.

    The cellular concentrations of these proteins are controlled by six reactions: producing tem from gen, using tem as a catalyst to produce struct and gen, degradation of tem and struct, and propagation of the virus using struct and gen such that

    [gen]k1[tem] ,[tem]k2,[tem]k3[gen] ,[gen]+[struct]k4,[tem]k5[struct] ,[struct]k6,

    where we adopt k1=0.25,k2=0.25,k3=1.0,k4=7.5×106,k5=1000, and k6=1.99. From [22] we know there are two steady states for this system: an unstable trivial solution at tem=gen=struct=0, and a stable steady state, which with the given parameter values is tem=30,gen=100,struct=12000. Ignoring the stochastic effects, as a macro scale limit, this system can be modeled by the ODEs:

    d[tem]dt=k1[gen]k2[tem],d[gen]dt=k3[tem]k4[gen][struct]k1[gen],d[struct]dt=k5[tem]k6[struct]k4[gen][struct].

    For a stochastic perspective, we model the system by a Markov process, where we follow the First Reactions method of the Gillespie algorithm[23], which at each time step assumes that the reaction that could take place in the shortest amount of time will occur next.

    From the upper left to the lower right, these clusters contain the representations for nodes near approximately (15,0,1000), (30,66.7,4000) and (30,100,8000). Direct simulations confirm that these are transition states: reactive trajectories will tend to pass through nodes in each of these clusters. Applying the feature learning method to this system produces a two-dimensional feature representation for each three-dimensional node (tem,gen,struct). Here, we take state A to be the origin and B=(30,100,12000). Figure 7 plots the two-dimensional feature representations for this system. In this figure, the nodes with higher similarities to the initial state (represented in red) and those with lower such similarities (represented in blue) appear in separate clusters. Figure 8 plots these feature representations with a logarithmic scale on the x-axis to better display the clusters.

    Figure 7.  Feature representations of nodes with high similarity to the initial state A=(0,0,0) for the virus propagation example.
    Figure 8.  Feature representations for the virus propagation example, with a logarithmic scale on the x-axis.

    Five distinct clusters are discernible in Figure 8: the red node in the top left is the feature representation for the reactant state, the red node at the bottom right is the representation of the product state, and the three clusters between each represent a transition state that reactive trajectories pass through while traveling from A to B. From the upper left to the lower right, these three clusters contain the representations for nodes near approximately (15,0,1000), (30,66.7,4000) and (30,100,8000). Direct simulations confirm that these are transition states: reactive trajectories will tend to pass through nodes in each of these clusters.

    Finally we will address a higher-dimension example, to illustrate the effectiveness of the method on reducing the dimension of a graph. Consider the E. coli σ-32 stress circuit, a network of regulatory pathways controlling the σ-32 protein, which is essential in the E. coli response to heat shock [24]. The systems consists of 10 processes:

    FtsHk1,Eσ32k2FtsH,GroELk3,Eσ32k4GroEL,σ32+Jcompk5σ32-Jcomp,σ32-Jcompk6Jcomp+σ32,σ32-Jcompk7FtsH,k8σ32,E+σ32k9E-σ32,E-σ32k10E+σ32.

    In this context, FtsH and GroEL are stress response proteins, E is a holoenzyme, and Jcomp (or J-complex) represents several chaperone proteins that are lumped as a simplification. Then E-σ32 denotes the protein complex formed when E binds to σ32, which catalyzes downstream synthesis reactions. σ32 is a product of translation, which we assume to occur at a rate corresponding to k8=0.007. It can also associate and dissociate of the J-complex. In accordance with [24] the other rate constants are taken to be: k1=7.4×1011,k2=4.41×106,k3=1.80×108,k4=5.69×106,k5=3.27×105,k6=4.4×104,k7=1.28×103,k9=0.7, and k10=0.13.

    For the above reaction rates, this system has a metastable state where the concentrations of FtsH, GroEL, σ32, and Jcomp are approximately 600, and the concentrations of the other three species are approximately 1500. Another metastable state occurs when the concentrations of FtsH, GroEL, σ32, and Jcomp are approximately 800. Here we take the former to be initial state A and the latter to be the product state B. Following a procedure similar to that in the above virus propagation example, three-dimensional representations are generated for each node. The plot of these node representations is given in Figure 9. The color scale in this figure is determined by the value of the similarity function between each node and the node A, with the most similar nodes shown in red. Nodes with lower similarities have been omitted for clarity. Four distinct clusters can be seen in Figure 9.

    Figure 9.  Feature representations of nodes with high similarity to the initial state, when initial state is taken to be A=(600,600,600,600,1500,1500,1500), for the E. Coli heat response circuit model.

    In this paper, we presented a method for analyzing metastable chemical reaction systems via feature learning on directed networks using random walk sampling. We have shown how this method may be used to identify transition states of Markov jump processes by interpreting such processes in terms of directed networks and taking advantage of Transition Path Theory. We have illustrated the efficacy of this method through several low-dimensional examples involving various energetic and entropic barriers. As noted above, for more complex, realistic examples, the method can be used for dimensional reduction, enabling the extraction and analysis of high-dimensional graph information. Further work will be necessary to fully understand this approach from a probabilistic standpoint. Another direction for future work is improving efficiency and development of faster algorithms. However, this method still forms the groundwork for a potential new way of analyzing directed networks and jump processes, particularly in the context of chemical kinetics.

    This work was supported by NSF-DMS 1720002.

    All authors declare no conflicts of interest in this paper.



    [1] P. Goyal, E. Ferrara, Graph embedding techniques, applications, and performance: a survey, Knowledge-Based Syst., 151 (2018), 78-94. doi: 10.1016/j.knosys.2018.03.022
    [2] H. Cai, V. W. Zheng, K. C. Chang, A comprehensive survey of graph embedding: problems, techniques, and applications, IEEE Trans. Knowl. Data Eng., 30 (2018), 1616-1637. doi: 10.1109/TKDE.2018.2807452
    [3] F. Chung, Laplacians and the cheeger inequality for directed graphs, Ann. Combinatorics, 9 (2005), 1-19. doi: 10.1007/s00026-005-0237-z
    [4] M. Chen, Q. Yang, X. Tang, Directed graph embedding, in IJCAI International Joint Conference on Artificial Intelligence, (2007), 2707-2712.
    [5] D. Zhou, J. Huang, B. Schoelkopf, Learning from labeled and unlabeled data on a directed graph, in Proceedings of the 22nd International Conference on Machine Learning, Association for Computing Machinery, (2005), 1036-1043.
    [6] M. Belkin, P. Niyogi, Laplacian eigenmaps and spectral techniques for embedding and clustering, in Advances in Neural Information Processing Systems, MIT Press, (2002), 585-591.
    [7] S. Roweis, L. Saul, Nonlinear dimensionality reduction by locally linear embedding, Science, 290 (2001), 2323-2326.
    [8] A. Ahmed, N. Shervashidze, S. Narayanamurthy, V. Josifovski, A. Smola, Distributed large-scale natural graph factorization, in WWW 2013-Proceedings of the 22nd International Conference on World Wide Web, (2013), 37-48.
    [9] B. Perozzi, R. Al-Rfou, S. Skiena, Deepwalk: Online learning of social representations, in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2014.
    [10] A. Grover, J. Leskovec, node2vec: Scalable feature learning for networks, in KDD: Proceedings International Conference on Knowledge Discovery and Data Mining, 2016 (2016), 855-864.
    [11] E. Wigner, The transition state method, Trans. Faraday Soc., 34 (1938), 29-41. doi: 10.1039/tf9383400029
    [12] P. Bolhuis, D. Chandler, C. Dellago, P. Geissler, Transition path sampling: throwing ropes over rough mountain passes in the dark, Ann. Rev. Phys. Chem., 53 (2002), 291-318. doi: 10.1146/annurev.physchem.53.082301.113146
    [13] E. Weinan, E. Vanden-Eijnden, Towards a theory of transition paths, J. Stat. Phys., 123 (2006), 503-523. doi: 10.1007/s10955-005-9003-9
    [14] R. Allen, P. Warren, P. Wolde, Sampling rare switching events in biochemical networks, Phys. Rev. Lett., 94 (2005), 018104. doi: 10.1103/PhysRevLett.94.018104
    [15] S. Cao, W. Lu, Q. Xu, Grarep: Learning graph representations with global structural information, in CIKM'15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, (2015), 891-900.
    [16] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, J. Dean, Distributed representations of words and phrases and their compositionality, in Advances in Neural Information Processing Systems, 26 (2013).
    [17] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, G. Monfardini, The graph neural network model, IEEE Trans. Neural Networks, 20 (2009), 61-80. doi: 10.1109/TNN.2008.2005605
    [18] P. Metzner, C. Schutte, E. Vanden-Eijnden, Transition path theory for markov jump processes, Mult. Mod. Sim., 7 (2008), 1192-1219.
    [19] C. Gardiner, U. Bhat, D. Stoyan, D. Daley, Y. Kutoyants, B. Rao, Handbook of stochastic methods for physics, chemistry and the natural sciences., Biometrics, 42 (1986), 226.
    [20] P. Metzner, C. Schutte, E. Vanden-Eijnden, Illustration of transition path theory on a collection of simple examples, J. Chem. Phys., 125 (2006), 084110. doi: 10.1063/1.2335447
    [21] J. Du, D. Liu, Transition states of stochastic chemical reaction networks, Comm. Comp. Phys., in press.
    [22] R. Srivastava, L. You, J. Summers, J. Yin, Stochastic vs. deterministic modeling of intracellular viral kinetics, J. Theor. Biol., 218 (2002), 309-321. doi: 10.1006/jtbi.2002.3078
    [23] D. Gillespie, Approximate accelerated stochastic simulation of chemically reacting systems, J. Chem. Phys., 115 (2001), 1716-1733. doi: 10.1063/1.1378322
    [24] R. Srivastava, M. S. Peterson, W. E. Bentley, Stochastic kinetic analysis of the escherichia coli stress circuit using σ32-targeted antisense, Biotechnol. Bioeng., 75 (2001), 120-129, doi: 10.1002/bit.1171
  • This article has been cited by:

    1. Yang Li, Hui Lv, Xing’an Wang, The Design of 2DOF IMC-PID Controller in Biochemical Reaction Networks, 2023, 13, 2076-3417, 3402, 10.3390/app13063402
    2. Paula Mercurio, Di Liu, Clustering molecular energy landscapes by adaptive network embedding, 2024, 4, 2770-372X, 10.20517/jmi.2023.40
  • Reader Comments
  • © 2021 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(2919) PDF downloads(142) Cited by(2)

Figures and Tables

Figures(9)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog