
Recently, topological data analysis has become a trending topic in data science and engineering. However, the key technique of topological data analysis, i.e., persistent homology, is defined on point cloud data, which does not work directly for data on manifolds. Although earlier evolutionary de Rham-Hodge theory deals with data on manifolds, it is inconvenient for machine learning applications because of the numerical inconsistency caused by remeshing the involving manifolds in the Lagrangian representation. In this work, we introduced persistent de Rham-Hodge Laplacian, or persistent Hodge Laplacian (PHL), as an abbreviation for manifold topological learning. Our PHLs were constructed in the Eulerian representation via structure-persevering Cartesian grids, avoiding the numerical inconsistency over the multi-scale manifolds. To facilitate the manifold topological learning, we proposed a persistent Hodge Laplacian learning algorithm for data on manifolds or volumetric data. As a proof-of-principle application of the proposed manifold topological learning model, we considered the prediction of protein-ligand binding affinities with two benchmark datasets. Our numerical experiments highlighted the power and promise of the proposed method.
Citation: Zhe Su, Yiying Tong, Guo-Wei Wei. Persistent de Rham-Hodge Laplacians in Eulerian representation for manifold topological learning[J]. AIMS Mathematics, 2024, 9(10): 27438-27470. doi: 10.3934/math.20241333
[1] | Amira Ishan . On concurrent vector fields on Riemannian manifolds. AIMS Mathematics, 2023, 8(10): 25097-25103. doi: 10.3934/math.20231281 |
[2] | Li Shen, Jian Liu, Guo-Wei Wei . Evolutionary Khovanov homology. AIMS Mathematics, 2024, 9(9): 26139-26165. doi: 10.3934/math.20241277 |
[3] | George Rosensteel . Differential geometry of collective models. AIMS Mathematics, 2019, 4(2): 215-230. doi: 10.3934/math.2019.2.215 |
[4] | Yunan He, Jian Liu . Multi-scale Hochschild spectral analysis on graph data. AIMS Mathematics, 2025, 10(1): 1384-1406. doi: 10.3934/math.2025064 |
[5] | Shahroud Azami . Monotonicity of eigenvalues of Witten-Laplace operator along the Ricci-Bourguignon flow. AIMS Mathematics, 2017, 2(2): 230-243. doi: 10.3934/Math.2017.2.230 |
[6] | Meraj Ali Khan, Ali H. Alkhaldi, Mohd. Aquib . Estimation of eigenvalues for the $ \alpha $-Laplace operator on pseudo-slant submanifolds of generalized Sasakian space forms. AIMS Mathematics, 2022, 7(9): 16054-16066. doi: 10.3934/math.2022879 |
[7] | Jaruwat Rodbanjong, Athipat Thamrongthanyalak . Characterizations of modules definable in o-minimal structures. AIMS Mathematics, 2023, 8(6): 13088-13095. doi: 10.3934/math.2023660 |
[8] | Wafa M. Shammakh, Raghad D. Alqarni, Hadeel Z. Alzumi, Abdeljabbar Ghanmi . Multiplicityof solution for a singular problem involving the $ \varphi $-Hilfer derivative and variable exponents. AIMS Mathematics, 2025, 10(3): 4524-4539. doi: 10.3934/math.2025209 |
[9] | Eunwoo Heo, Jae-Hun Jung . Persistent homology of featured time series data and its applications. AIMS Mathematics, 2024, 9(10): 27028-27057. doi: 10.3934/math.20241315 |
[10] | Jun Ma, Junjie Li, Jiachen Sun . A novel adaptive safe semi-supervised learning framework for pattern extraction and classification. AIMS Mathematics, 2024, 9(11): 31444-31469. doi: 10.3934/math.20241514 |
Recently, topological data analysis has become a trending topic in data science and engineering. However, the key technique of topological data analysis, i.e., persistent homology, is defined on point cloud data, which does not work directly for data on manifolds. Although earlier evolutionary de Rham-Hodge theory deals with data on manifolds, it is inconvenient for machine learning applications because of the numerical inconsistency caused by remeshing the involving manifolds in the Lagrangian representation. In this work, we introduced persistent de Rham-Hodge Laplacian, or persistent Hodge Laplacian (PHL), as an abbreviation for manifold topological learning. Our PHLs were constructed in the Eulerian representation via structure-persevering Cartesian grids, avoiding the numerical inconsistency over the multi-scale manifolds. To facilitate the manifold topological learning, we proposed a persistent Hodge Laplacian learning algorithm for data on manifolds or volumetric data. As a proof-of-principle application of the proposed manifold topological learning model, we considered the prediction of protein-ligand binding affinities with two benchmark datasets. Our numerical experiments highlighted the power and promise of the proposed method.
Recent years have witnessed a fast growth of topological data analysis (TDA) in data science and engineering [63]. The growth is driven by the great promise of topological approaches to real-world data that are distinguished from any other statistical, mathematical, physical, and engineering methods [11,42]. Typically, TDA offers a multi-scale topological characterization of data, which is the case with persistent homology [23,70], a key method employed in TDA. A major feature of persistent homology is its multi-scale analysis, which creates a family of topological spaces from the original data to track the topological persistence, i.e., the lifespan of topological invariants across scales [5,27]. The other major feature of persistent homology is its topological description of a space (like connected components, loops, and voids) in terms of topological invariants, such as Betti numbers. As such, persistent homology-based TDA leads to much topological simplification of the geometric information in the data [1,20]. Consequently, TDA typically works extremely well for data with intricate complexity [59,67]. Unfortunately, for data without geometric complexes, TDA may give rise to an oversimplification of key geometric characteristics, leading to a less competitive approach.
For many years, persistent homology has been used in qualitative analysis, which is somewhat counterintuitive and unproductive for nonexperts. The power of persistent homology was not demonstrated until it was utilized in quantitative and predictive analysis via machine learning algorithms [8,39]. Topological deep learning (TDL), coined in 2017 [9], was introduced to deal with large and intrinsically complex datasets using both persistent homology and deep neural networks. More recently, simplicial neural networks and other topological neural techniques have been applied in TDL to the design of neural network architecture. TDL has become an emerging paradigm in data science and machine learning [49]. However, an increasing concern associated with this rising popularity is whether TDL brings any practical benefit beyond its mathematical elegance. There are many applications where TDL has demonstrated superiority to other competitive methods [44]. Perhaps some of the most compelling examples are TDL's dominant wining of D3R Grand Challenges, an annual worldwide competition series in computer-aided drug design [45,46], its discovery of the mechanisms of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) evolution [16,60], and its successful forecast of emerging dominant SARS-CoV-2 variants BA.2 [17] and BA.4/BA.5 about two months in advance [15].
It is interesting to understand why TDL (or TDA) was so successful in the aforementioned examples, but was not competitive in many other situations in the literature [50]. First, biomolecular data, which is intricately complex in their internal structures[67], was involved in the above compelling examples. As such, topological simplification was a productive process, whereas TDL leads to the severe loss of crucial geometric information in many other data that is relatively simple in their internal structures. Additionally, it was element-specific persistent homology, rather than the plain persistent homology, that was applied in the above examples. This approach captures physical and biological interactions in the biomolecular data [9]. In fact, in the forecast of emerging dominant SARS-CoV-2 variants BA.4/BA.5, persistent Laplacian, rather than persistent homology, was utilized. This happens because persistent homology has many drawbacks or limitations [64]. First, the topological invariant extracted from persistent homology is qualitative, rather than quantitative. For example, the barcode from persistent homology does not distinguish a five-number from a six-number ring. Additionally, persistent homology is incapable of dealing with different elements in a point cloud, which is ineffective with the physics and chemistry of (bio)molecular data. Moreover, persistent homology cannot describe non-topological changes, i.e., homotopic shape evolution during the multi-scale (or filtration) analysis. Further, persistent homology is incapable of handling directed networks and digraphs, such as polarization, regulation, and control issues in applications. Finally, persistent homology is unable to characterize structured data, e.g., hypergraphs, directed networks, etc. These challenges call for innovative new topological methods.
To address these challenges, the persistent spectral graph, also known as persistent combinatorial Laplacian or persistent Laplacian (PL), was introduced in 2019 [61]. The harmonic spectra of PLs fully recover the topological invariants of persistent homology. However, the nonharmonic spectra of PLs capture the homotopic shape evolution during the multi-scale analysis that cannot be observed with persistent homology. Computational algorithms [22,62] and mathematical analysis [33,40] of PLs have been reported. In the past few years, much effort has been given to extend persistent Laplacian to further address other limitations of persistent homology [28], leading to persistent sheaf Laplacians [65], persistent path Laplacians, persistent hypergraph and hyperdigraph Laplacians [37], persistent directed flag Laplacians, persistent Mayer Laplacians, and persistent interaction Laplacians [64]. PLs have been shown to outperform persistent homology in many applications [15,41].
However, defined on point cloud data, neither persistent homology nor PL can directly deal with two other commonly occurring data formats, namely, data on manifolds [18], such as electron density [68], cryogenic electron microscopy density, and computed tomography images [14], and curves embedded in the three-dimensional Euclidean space, such as knots, links, and tangles, and their generalizations [30,48]. Multi-scale Gauss link integral [54] and evolutionary Khovanov homology have been proposed to deal with embedded curve data [55]. Evolutionary Khovanov homology integrates algebraic topology, geometric topology, and metric analysis for the first time. However, effective computational algorithms are needed for this approach to be widely used in practical applications.
To carry out manifold topological analysis of data on manifolds, the evolutionary de Rham-Hodge method was introduced [18]. This approach creates a family of multi-scale manifolds with boundaries from a given data and then builds evolutionary Hodge Laplacian operators on the multi-scale manifolds with appropriate boundary conditions. While originated from sharply different topological spaces, evolutionary Hodge Laplacian and PLs share the same algebraic structure and capture topological invariants in their harmonic spectra [52]. Case studies have been given to demonstrate evolutionary de Rham-Hodge theory-based manifold topological analysis of data on manifolds [18]. However, this approach was based on discrete exterior calculus [19,21] or finite element exterior calculus [3] in the Lagrangian representation, which is not efficient for multi-scale analysis and machine learning studies. Specifically, the regeneration of the evolving manifolds at different scales with different Lagrangian meshes causes numerical inconsistencies and becomes expensive for practical applications in machine learning studies. This challenge calls for new effective manifold topological analysis approaches for data on manifolds.
The objective of this work is to develop a persistent de Rham-Hodge theory on the Euler representation for manifold topological learning (MTL). To this end, we solve Hodge Laplacians on a pre-designed structure-persevering Cartesian grid for all scales to avoid numerical inconsistency. We construct a natural mapping of differential forms from a manifold with boundary embedded in R3 to a large manifold, use it to produce persistent cohomology mapping, and construct a persistent Hodge Laplacian with built-in boundary conditions. Our new approach draws on differential geometry, algebraic topology, partial differential equations, metric analysis, and numerical analysis. To give a proof-of-principle demonstration, we pair the proposed persistent de Rham-Hodge Laplacians with an effective machine learning algorithm to predict protein-ligand binding affinities. Based on two benchmark datasets in the Protein Data Bank (PDB), PDBbind v2007 and PDBbind v2016, we show that our MTL model gives rise to cutting-edge performance.
The rest of this paper is organized as follows: Section 2 offers a primer on the de Rham-Hodge theory on manifolds with boundaries; Section 3 presents our discretization for evolutionary de Rham-Hodge theory based on spectrum calculation of Laplacians associated with sublevel sets on Cartesian grids; Section 4 presents our construction for persistent de Rham-Hodge Laplacians both in the continuous setting and for given level set functions on Cartesian grids; Section 5 showcases preliminary studies on the applications of MTL; and Section 6 concludes the paper.
The de Rham-Hodge theory is an advanced mathematical framework that merges ideas from differential geometry, algebraic topology, analysis, and partial differential equations to study the properties of differential forms on smooth manifolds. It plays a crucial role in understanding the topology and geometry of manifolds through differential forms. The de Rham-Hodge theory consists of de Rham cohomology and Hodge theory. The former concerns differential forms, exterior derivative, and cohomology groups, while the latter deals with Riemannian manifolds, Hodge star operator, Hodge Laplacian, and Hodge decomposition.
Let M be an m-dimensional smooth, orientable, compact Riemannian manifold with boundary. Denote by Ωk(M) the space of all differential k-forms on M, i.e., the space of all smooth antisymmetric covariant tensor fields on M of degree k. The differential d, also called exterior derivative, is the unique R-linear mapping from the space of k-forms Ωk(M) to the space of (k+1)-forms Ωk+1(M) satisfying the Leibniz rule with respect to the wedge product ∧ and the nilpotent property dd=0. A key property of differential forms is that they can be integrated over any orientable k-submanifolds of M. For any oriented (k+1)-submanifold S⊂M with boundary ∂S, Stokes' theorem, as a generalization of the Newton-Leibniz rule, states that the integral of a differential k-form ω over ∂S is equal to the integral of its differential over S, i.e.,
∫Sdω=∫∂Sω. | (2.1) |
The differential d generalizes and unifies the classical operators in vector calculus, such as gradient ∇, curl ∇×, and divergence ∇⋅ in R2 and R3. For instance, in R3, 0-forms and 3-forms can be identified with scalar fields, while 1-forms and 2-forms can be identified with vector fields. In this case, the differential d corresponds to the gradient operator ∇ when applied to 0-forms, the curl operator ∇× when applied to 1-forms, or the divergence operator ∇⋅ when applied 2-forms. The nilpotent property dd=0 directly leads to the vector field analysis identities ∇×∇=0 and ∇⋅∇×=0.
A differential form ω∈Ωk(M) is called closed if dω=0, or exact if there is a (k−1)-form ζ∈Ωk−1(M) such that ω=dζ. Due to the property dd=0, every exact form is closed. Thus, the differential d links the sequence of the spaces of differential forms on M into a co-chain complex
0⟶Ω0(M)d⟶Ω1(M)d⟶⋯d⟶Ωm−1(M)d⟶Ωm(M)→0. | (2.2) |
The k-th de Rham cohomology group, denoted by HkdR(M), is then defined to be the k-th homology of this chain complex, i.e., the quotient space of closed k-forms modulo the space of exact k-forms, i.e.,
HkdR(M)=ker(d:Ωk(M)→Ωk+1(M))im(d:Ωk−1(M)→Ωk(M)). | (2.3) |
The de Rham cohomology, by the de Rham theorem, is naturally isomorphic to the singular cohomology, and thus depends only on the manifold topology.
Let g be a Riemannian metric on M and ⟨⋅,⋅⟩g be the point-wise inner product induced by g on Ωk(M). The Hodge star operator ⋆ provides an isomorphism from the space of differential k-forms Ωk(M) to the space of (m−k)-forms Ωm−k(M), defined by the following formula:
ω∧⋆η=⟨ω,η⟩gμg, | (2.4) |
where μg is the volume form on M induced by g. The Hodge L2-inner product on the space of k-forms Ωk(M) can then be obtained by taking the integral of the formula (2.4)
(ω,η)=∫Mω∧⋆η. | (2.5) |
The codifferential δ:Ωk(M)→Ωk−1(M) is defined by
δ=(−1)m(k−1)+1⋆d⋆, | (2.6) |
which also has the nilpotent property δδ=0. We call a differential form ω∈Ωk(M) co-closed if δω=0, or co-exact if there is a (k+1)-form η∈Ωk+1(M) such that ω=δη. The codifferential δ, as the differential d, also extends the classical gradient, curl, and divergence in vector calculus. In R3, it corresponds to −∇⋅, ∇×, and −∇ when applied to 1-forms, 2-forms, and 3-forms, respectively.
The Hodge Laplacian for differential forms is defined as Δ=dδ+δd:Ωk(M)→Ωk(M). Its kernel, consisting of all differential k-forms ω on M with Δω=0, is called the space of harmonic k-forms. We denote by HkΔ(M) the space of harmonic k-forms and by Hk(M) the space of k-forms that are both closed and co-closed, i.e., Hk(M)=kerd∩kerδ. The latter space Hk(M), known as the space of harmonic k-fields, is in general only a subset of the space of harmonic forms Hk(M)⊂HkΔ(M), and is infinite-dimensional [53]. However, in the case of closed manifolds where ∂M=∅, the space of harmonic forms HkΔ(M) reduces to the space Hk(M), as any harmonic form is both closed and co-closed. The result follows directly from the following formula:
0=(Δω,ω)=((dδ+δd)ω,ω)=(dω,dω)+(δω,δω), | (2.7) |
due to the L2-adjointness of the codifferential δ and the differential d on closed manifolds, i.e., (dω,η)=(ω,δη).
The classical Hodge decomposition theorem for closed manifolds states that the space of differential k-forms Ωk(M) can be decomposed as
Ωk(M)=dΩk−1(M)⊕δΩk+1(M)⊕HkΔ(M). | (2.8) |
These three subspaces are mutually orthogonal with respect to the inner product (2.5). Moreover, Hodge theorem identifies the harmonic space HkΔ(M) with the k-th de Rham cohomology group HkdR(M), which states that each harmonic form corresponds to exactly one equivalence class in HkdR(M). Therefore, the harmonic space HkΔ(M) is fully determined by the manifold topology, and is finite-dimensional with its dimension given by the Betti number dimHkΔ(M)=βk.
In the presence of a nonempty boundary ∂M, the two operators d and δ are not L2-adjoint, as integration by parts leads to [69]
(dω,η)=(ω,δη)+∫∂Mω∧⋆η, | (2.9) |
which contains a boundary term that may not vanish, and thus the decomposed subspaces in (2.8) are not orthogonal. However, certain boundary conditions can be enforced, ensuring the adjointness of the differential d and the codifferential δ, thereby inducing an orthogonal decomposition of the space of differential forms.
The most common choices of boundary conditions ensuring the adjointness of d and δ are the normal (Dirichlet) and tangential (Neumann) boundary conditions. A differential form ω∈Ωk(M) is called normal (Dirichlet) if it gives zero when applied to tangent vectors of the boundary, or tangential (Neumann) if the same holds for its dual ⋆ω instead. Denote by Ωkn(M) the set of normal differential k-forms and by Ωkt(M) the set of tangential differential forms, i.e.,
Ωkn(M)={ω∈Ωk(M)|ω|∂M=0}; | (2.10) |
Ωkt(M)={ω∈Ωk(M)|⋆ω|∂M=0}. | (2.11) |
Following their definitions, the spaces Ωkn(M) and Ωm−kt(M) are isomorphic under the Hodge star operator ⋆, also known as the Hodge duality. Moreover, the differential d preserves the normal boundary conditions, while the codifferential δ preserves the tangential boundary conditions.
The Hodge-Morrey decomposition [43] states that there is a 3-component L2-orthogonal decomposition
Ωk(M)=dΩk−1n(M)⊕δΩk+1t(M)⊕Hk(M), | (2.12) |
The orthogonality of the decomposition directly comes from the adjointness of δ and d when enforcing the normal or tangential boundary conditions. For ω∈Ωk(M), there is a unique decomposition of ω given as follows:
ω=dαn+δβt+η, | (2.13) |
where αn∈Ωk+1n(M), βt∈Ωk+1t(M), and η∈Hk(M). Note that the potentials αn and βt are not uniquely determined as all αn+dη and βt+δγ with any η∈Ωk−2n(M) and γ∈Ωk+2t(M) serve as potentials for the same components. However, the issue can be addressed by enforcing gauge conditions, such as
δαn=0, | (2.14) |
⋆dβt=0. | (2.15) |
The potentials αn and βt can then be uniquely determined by the following equations:
{Δαn=δω,Δβt=dω, | (2.16) |
by resolving the (finite) rank deficiencies of Δ under these boundary conditions (Eqs (2.10), (2.11), (2.14), and (2.15)).
Remark 1. In the case that M is a closed manifold, i.e., ∂M=∅, both the spaces Ωkn(M) and Ωkt(M) coincide with the space of differential forms Ωk(M), and the space of harmonic fields is identical to the space of harmonic forms Hk(M)=HkΔ(M). The Hodge decomposition (2.12) then reduces to the classical Hodge decomposition (2.8) for closed manifolds.
Remark 2. The Hodge-Morrey decomposition (2.12) in the low-dimensional Euclidean spaces R2 and R3, often referred to as the Helmholtz-Hodge decomposition in vector calculus, states that any vector field v defined on a compact domain can be orthogonality decomposed as
v=∇f+∇×u+h, | (2.17) |
where f is a scalar potential that vanishes on the boundary of the domain, u is a vector field orthogonal to the boundary, and h is the harmonic vector field satisfying ∇×h=0 and ∇⋅h=0. The first component ∇f and the second component ∇×u are often called the curl-free and divergence-free parts of the vector field v respectively. Note that in the presence of a boundary, the resulting scalar potential f is also called satisfying the normal boundary of 0-forms, and the vector field u is called satisfying the tangential boundary condition of 2-forms, which are direct counterparts of the potentials αn and βt in (2.12). For a complete correspondence between scalar or vector fields, and differential forms under the normal and tangential boundary conditions, see [69].
The space of harmonic fields Hk, in general, is infinite-dimensional, and thus has no direct correspondence with the cohomology of the manifold. However, as noted early [69], one can restrict to the space of normal harmonic fields, namely, Hkn(M)=Hk(M)∩Ωkn(M), and the space of tangential harmonic fields, Hkt(M)=Hk(M)∩Ωkt(M). As a consequence of the de Rham map, these two subspaces Hkn(M) and Hkt(M) are fully determined by the topology of M: the space of normal harmonic fields Hkn(M) is isomorphic to the relative de Rham cohomology HkdR(M,∂M), while the space of tangential harmonic fields Hkt(M) is isomorphic to the absolute de Rham cohomology HkdR(M) [25]. The two subspaces Hkn(M) and Hkt(M) are thus finite-dimensional, with dimensions given by the Betti numbers: dimHkn(M)=βm−k and dimHkt(M)=βk. Furthermore, the kernels of the Hodge Laplacian Δ, when restricted to the space of normal forms Ωkn(M) and the space of tangential forms Ωkt(M) with gauge conditions on the boundary, can be identified to the space of normal harmonic fields and the space of tangential harmonic fields, respectively. Denote by Δn and Δt the restrictions of the Hodge Laplacian Δ on the space of normal fields Ωkn(M) satisfying Eq (2.14) and the space of tangential fields Ωkt(M) satisfying Eq (2.15), i.e., Δn:Ωkn(M)→Ωk(M) and Δt:Ωkt(M)→Ωk(M). Then, immediately we have kerΔn=Hk(M)∩Ωkn(M)=Hkn(M) and kerΔt=Hk(M)∩Ωkt(M)=Hkt(M). The result follows directly from Eq (2.7). These identifications finally enable us to study the topology of the underlying manifold M through the Hodge Laplacians on normal and tangential forms.
Remark 3. In fact, let Hkco=Hk(M)∩δΩk+1(M) and Hkex=Hk(M)∩dΩk−1(M). The space of harmonic fields Hk(M) can be further orthogonally decomposed for smooth manifolds
Hk(M)=Hkco(M)⊕Hkn(M) | (2.18) |
=Hkex(M)⊕Hkt(M), | (2.19) |
which results in the Hodge-Morrey-Friedrichs decomposition given as follows:
Ωk(M)=dΩk−1n(M)⊕δΩk+1t(M)⊕Hkco(M)⊕Hkn(M) | (2.20) |
=dΩk−1n(M)⊕δΩk+1t(M)⊕Hkex(M)⊕Hkt(M). | (2.21) |
In particular, if M is a compact domain in Euclidean spaces, then there is a unique orthogonal 5-component decomposition
Ωk(M)=dΩk−1n(M)⊕δΩk+1t(M)⊕Hkn(M)⊕Hkt(M)⊕(dΩk−1(M)∩δΩk+1(M)), | (2.22) |
as the spaces Hkn(M) and Hkt(M) are L2-orthogonal, instead of just being transversal for compact manifolds in general [56]. Due to the correspondence between differential forms and vector fields in the low-dimensional Euclidean spaces, the implementation of this 5-component Hodge decomposition has been applied and implemented to the study of vector fields for surface triangle meshes, for tetrahedral meshes [69] and for regular Cartesian grids [58].
As we mainly focus on applications of compact domains in R3, to study the geometric and topological information of the underlying manifolds, there are eight Laplacians to be considered, which are defined on the spaces of differential k-forms with k=0,1,2,3 satisfying either the normal or the tangential boundary conditions. However, thanks to the duality between the space of normal fields and tangential fields, the study of the spectra of these eight Laplacians reduces to that of four Laplacians on one of the two types of boundary conditions, and finally to the singular spectra of three differential operators, applied to differential forms of degree k=0,1,2,3 [18]. Further details will be discussed in the next section for the discretization of Laplacians.
In this section, we elaborate on the discretization of the Hodge Laplacian and introduce the boundary-induced graph (BIG) Laplacian for compact domains in low-dimensional Euclidean spaces [52]. Although the theory works for 2D compact domains, for the remainder of the paper we focus only on compact domains in R3, as we target mainly 3D applications. We use discrete exterior calculus (DEC) to discretize all differential operators and differential forms on regular Cartesian grids, as it allows for efficient and accurate numerical algorithms relying on just matrix algebra, while keeping the L2 orthogonality between different components in Hodge decomposition. In addition, the constructed discrete differential operators and differential forms in DEC approximate their smooth analogs. For the characterization of the underlying manifold, we choose the Eulerian formulation, where the manifold is given as a sublevel set of a level set function defined on a regular Cartesian grid. Another common way, called the Lagrangian formulation, discretizes the manifold as simplicial meshes, i.e., triangular or tetrahedral meshes in 2D or 3D. The spectrum analysis of the Hodge Laplacians has been discussed in [69] for the Lagrangian formulation and in [58] for the Eulerian formulation. Compared to the Lagrangian case, the Eulerian representation uses vertices, edges, faces, and cells all fixed in a Cartesian grid, which significantly simplifies the data structures and algorithms. The Hodge stars, in the latter case, are close to rescaled identity matrices. This fact simplifies the study of Hodge Laplacians to that of BIG Laplacians with no Hodge stars involved, and thus leads to algorithms with efficient computations.
Denote by Im a rectangular m-dimensional regular Cartesian grid with k-cells oriented according to their alignments with the coordinate axes. The entire grid Im can be treated as a cell complex tessellating a rectangular domain in Rm, where each k-cell is a k-dimensional hypercube with edge length ℓ. A continuous differential k-form ω on Im, following the de Rham map, can be discretized by its integral value over each oriented k-cell σi, given as Wi=∫σiω [19]. The discrete differential on discrete k-forms of the grid Im is then encoded by a sparse matrix DIk, which stores the signed incidence between (k+1)-cells and k-cells and is given as the transpose of the cell boundary operator ∂Tk+1 on (k+1)-cells following from Stokes' theorem ∫σdω=∫∂σω. An illustration of the chain complex formed by boundary operator ∂ for a simple grid complex with a single 2D cell can be seen in Figure 1, which is a straightforward generalization of the chain complex on simplicial complexes. Note that the boundary of the boundary of a cell always results in a 0 chain, i.e., ∂∂=0, whose transpose immediately produces DIk+1DIk=0, thus preserving the nilpotent property in the continuous setting.
The discrete Hodge star establishes a one-to-one correspondence between discrete k-forms on the primal grid Im and discrete (m−k)-forms on its dual grid, given as the translated grid with grid points located at the m-cell centers of Im, based on the following formula:
1|σk|∫σkω≈1|⋆σk|∫⋆σk⋆ω, | (3.1) |
where ⋆σk is the dual (m−k)-cell formed by the dual grid points located at the centers of the primal m-cells incident to σk. See Figure 2 for an illustration of the correspondences between the primal and dual cells in the Cartesian grid case. Following from the discretization of differential forms, this correspondence leads to a diagonal matrix SIk with diagonal entries given by the ratio between the volumes of the dual (m−k)-cells and the primal k-cells, ℓm−k/ℓk=ℓm−2k. The associated discrete Hodge L2-inner product (2.5) of two discrete k-forms Vk and Wk on grid Im is then given by
(Vk,Wk)I=VTkSIkWk. | (3.2) |
The discrete codifferential, by definition of its smooth counterpart (2.6), can be assembled from the discrete differential and Hodge star operators as δIk=(SIk−1)−1DIk−1SIk. Note that the discrete counterpart of the Hodge Laplacian Δ=dδ+δd by replacing the differential and codifferential operators results in a nonsymmetric matrix. Instead, we consider the counterpart of ⋆Δ as the discrete Hodge Laplacian given by
LIk=(DIk)TSIk+1DIk+SIkDIk−1(SIk−1)−1(DIk−1)TSIk, | (3.3) |
where the operators are considered to be null for k<0 or k>m.
Compared to the case of simplicial or polygonal meshes, where the projection matrices to the interior can be straightforward to implement with the boundary elements explicitly labeled, modeling the manifold M as the volume bounded by a level set surface leads to delicate computation of the projection matrices. Note that the boundary of M using grid representation typically intersects with boundary k-cells instead of being its supersets. We restrict the computation to relevant cells by implementing the two types of boundary conditions through the inclusion or exclusion of the entire k-cells. We use the strategy as in [58] for the computation of projection matrices for each type of boundary condition: for the normal boundary condition, we include all cells if at least one of its vertices is inside or on the boundary of M, while for the tangential boundary condition, we include all cells with at least one of the vertices of the corresponding dual cells is inside or on the boundary. We refer to the former set of cells as the normal support and the latter as the tangential support. In contrast to the mesh case, it is important to note that neither the normal nor the tangential support is necessarily a superset of the other. See Figure 3 for one example showing the distinction of these two supports for 1-forms.
In the computation of the discrete Hodge star operators, it is essential to consider and incorporate the boundary conditions. Following the procedure in [58], we keep the dual cell volumes and adjust the primal cell volumes for normal boundary conditions, and do conversely for tangential boundary conditions with the primal cell volumes kept and the dual cell volumes changed. To be specific, when dealing with normal (resp., tangential) boundary conditions, we only compute the volume of the region of the primal (resp., dual) k-cells within the boundary ∂M for the denominator (resp., numerator) of the ratio in the discrete Hodge star matrix, and leave the dual (resp., primal) cell volumes in the numerator (resp., denominator) unchanged. Each unaltered k-cell has a k-volume of ℓk. In addition, for numerical stability, we do not alter the volume of outside primal k-cells, and perturb the level set function evaluated at primal/dual grid points to have an absolute value above ϵ=10−5ℓ, which ensures well-behaved fractional k-volumes. We denote by SIk,n and SIk,t the diagonal Hodge star matrices defined on the entire grid Im corresponding to the normal and tangential boundary conditions, respectively.
The projection matrix to the corresponding support, for each type of boundary condition, can be constructed from the identity matrices by eliminating the rows corresponding to k-cells outside the support. Denote by Pk,n the projection matrix for k-cells onto the normal support and by Pk,t the one onto the tangential support. We then obtain a new set of differential and Hodge star operators for M:
Dk,n=Pk+1,nDkPTk,n,Sk,n=Pk,nSIk,nPTk,n. | (3.4) |
Dk,t=Pk+1,tDkPTk,t,Sk,t=Pk,tSIk,tPTk,t. | (3.5) |
The nilpotent property Dk+1,nDk,n=0 and Dk+1,tDk,t=0 still holds for both boundary conditions due to DIk+1DIk=0 and the following observations:
PTk+1,nPk+1,nDIkPTk,n=DIkPTk,n,Pk+1,tDIkPTk,tPk,t=Pk+1,tDIk. | (3.6) |
The discrete Hodge L2-inner products of the two types of discrete k-forms on the manifold M for these two boundary conditions are then given by
(ξk,ζk)n=(ξk)TSk,nζk, | (3.7) |
(ξk,ζk)t=(ξk)TSk,tζk, | (3.8) |
whose domains are the discrete Ωkn(M) and the discrete Ωkt(M), respectively. Finally, we assemble the two types of discrete Hodge Laplacians as in the mesh case:
Lk,n=DTk,nSk+1,nDk,n+Sk,nDk−1,nS−1k−1,nDTk−1,nSk,n, | (3.9) |
Lk,t=DTk,tSk+1,tDk,t+Sk,tDk−1,tS−1k−1,tDTk−1,tSk,t. | (3.10) |
The null spaces of these discrete Hodge Laplacians, as in the continuous case, are fully determined by the topology of the underlying manifold M, since they only depend on the differential and projection matrices. The dimension of the kernel of Lk,n is given by the Betti number βm−k, while the dimension of the kernel of Lk,t is given by βk. Here, the Betti number βk presents directly the number of k-dimensional holes on the manifold M. For instance, β0 gives the number of connected components, β1 gives the number of tunnels, and β2 provides the number of closed cavities, respectively. The spectra of these Laplacians, in addition, could be used to study the geometric information of the manifold. It is known that the nonzero eigenvalues of the Laplacians provide rich insights into the shape of a manifold. For instance, the Fiedler value, defined as the smallest nonzero eigenvalue of a graph Laplacian, describes connectivity. As another example, the multiplicity of eigenvalues can reveal certain symmetries of the shape.
Remark 4. The two types of discrete Hodge Laplacians (3.9) not only provide rich geometrical and topological information of the underlying manifold, but also play a central role in the computation of the discrete Hodge decomposition (2.22) of differential forms for compact domains in 2D and 3D Euclidean spaces. In particular, they can be utilized, by resolving the rank deficiencies, to compute the potentials of the decomposed components in Hodge decomposition on normal or tangential support satisfying the corresponding boundary conditions. In addition, as the kernel sizes of Laplacians are finite, their eigenvectors corresponding to 0 eigenvalues, for each k, form a basis for the space of normal or tangential harmonic fields.
Note that the discrete Hodge stars in the Eulerian setting are almost identical to rescaled identity matrices. Therefore, the computations of the Hodge Laplacian can be further simplified by replacing the Hodge stars with identity matrices, leading to the definition of the BIG Laplacians as follows:
LBk,n=DTk,nDk,n+Dk−1,nDTk−1,n, | (3.11) |
LBk,t=DTk,tDk,t+Dk−1,tDTk−1,t. | (3.12) |
The BIG Laplacians were introduced in [52] for bounded domains to facilitate the comparison and contrast of the Hodge Laplacians and the combinatorial Laplacians. They preserve the Hodge Laplacian's capability to perform differential calculus but also retain the discrete nature of combinatorial Laplacians. The convergence of the spectra of the BIG Laplacians to Hodge Laplacians has been discussed in [52], showing that the spectra of (3.11) converge to those of Hodge Laplacians up to a scaling value ℓ−2 when enforcing the boundary conditions. This scaling value ℓ−2 is exactly the ratio between the missing scaling factor ℓm−2(k+1) in Lk and the missing factor ℓm−2k of Sk. As the BIG Laplacians produce results similar to those obtained from the discrete Hodge Laplacians with less computation, they can also be used to study the geometric and topological information of the underlying manifolds.
Note that the dual grid is also a Cartesian grid staggered with the primal grid by a replacement of ℓ/2 in all three axial directions of the Cartesian coordinates. For the study of the spectra of these Laplacians, one only needs to implement one type of boundary condition, for instance, the normal boundary condition, as Lk,n defined on the primal grid with normal boundary conditions is equivalent to Lm−k,t defined on its dual grid with tangential boundary conditions.
Preserving the topological structure is a major characteristic of the present work. However, preserving topological structure in various Laplacian operators is a nontrivial job. In this section, we present the detailed construction of topology-preserving Laplacians.
Note that, on the grid, the Hodge Laplacians and the BIG Laplacians are of the same sparsity patterns. For simplicity in exposition when discussing the spectrum analysis of the Laplacians, we let Lk be a generic Laplacian matrix of the form
Lk=DTkSk+1Dk+SkDk−1S−1k−1DTk−1Sk. | (3.13) |
Here, the Laplacian Lk can be interpreted, under choices of boundary conditions and Hodge star accuracy, as either a Hodge Laplacian or BIG Laplacian (with Sk set to identity) under tangential or normal boundary condition. The eigenvalues and eigenvectors of Lk can be solved by considering the generalized eigenvalue problem
LkW=λSkW, | (3.14) |
where λ is an eigenvalue and W is the associated eigenvector. To analyze the results, we perform the following transformation in the space of discrete forms: ˉDk=S1/2k+1DkS−1/2k, ˉLk=S−1/2kLkS−1/2k, and ˉW=S1/2kW. Rewriting the formulas above yields a simplified form of the Laplacian
ˉLk=ˉDTkˉDk+ˉDk−1ˉDTk−1, | (3.15) |
and a regular eigenvalue problem:
ˉLkˉW=λˉW. | (3.16) |
Note that the property ˉDkˉDk−1=0 is preserved. As the nonzero eigenvalues of ˉDTkˉDk and ˉDkˉDTk for each k are the same, given by the squared nonzero singular values of the discrete differential ˉDk, and each Laplacian ˉLk is just the combination of ˉDTkˉDk and ˉDk−1DTk−1, the entire spectrum of the Laplacians can thus be studied through the singular values of discrete differentials. Let
ˉDk=Uk+1ΣkVTk | (3.17) |
be the singular value decomposition of ˉDk, where Uk+1 and Vk are orthogonal matrices and Σk is a rectangular diagonal matrix with diagonal values given by the singular values of ˉDk. It follows immediately from ˉDkˉDk−1=0 that
ΣkVTkUkΣk−1=0. | (3.18) |
Therefore, the columns of Vk corresponding to nonzero singular values of ˉDk are orthogonal to columns of Uk associated with nonzero singular values of ˉDk−1. In addition, it follows from
Lk=VkΣ2kVTk+UkΣ2k−1UTk | (3.19) |
that the spectrum of ˉLk is given by the union of squared nonzero singular values of ˉDk, ˉDk−1, and 0, with the multiplicity of 0 given by the k-th Betti numbers. The columns of Uk and Vk corresponding to nonzero singular values, together with the set of harmonic forms, span the entire space of differential k-forms.
In the case that dim(M)=3, for each type of boundary condition, we have four Laplacians of different degrees in total k=0,1,2,3:
ˉL0=ˉDT0ˉD0, | (3.20) |
ˉL1=ˉDT1ˉD1+ˉD0ˉDT0, | (3.21) |
ˉL2=ˉDT2ˉD2+ˉD1ˉDT1, | (3.22) |
ˉL3=ˉD2ˉDT2. | (3.23) |
Due to the aforementioned discussion on the spectrum of Laplacians and the duality of the normal and tangential boundary conditions, the spectral analysis of all Laplacians can be reduced to the singular spectra analysis of the three discrete differentials ˉD0, ˉD1, and ˉD2 with one type of boundary condition. Note that the numerical evaluation of the singular values of these differentials, in the simplicial mesh case, may differ for the two types of boundary conditions, as the degrees of freedom (DoF) for normal k-forms and tangent m−k forms are different. However, in the Cartesian representation, they are strictly equivalent to each other by shifting the grid in all directions of the axis by ℓ/2, so long as M is at least one grid spacing away from the boundary of the grid.
For the computation of the spectra of the Laplacians, we choose the normal boundary condition. The spectra of all Laplacians ˉLk,n for compact domains in R3 can be finally decomposed into three distinct parts: the squared singular values of the gradient of tangential scalar fields, denoted by T, the squared singular values of the gradient of normal scalar fields, denoted by N, and the squared singular values of the curl of tangential curl fields, denoted by C.
In this section, we present the construction of the persistent de Rham-Hodge Laplacian on differentiable manifolds, which is based on the filtration of manifolds induced by varying a single parameter (the filtration parameter). The spectra of Laplacians carry rich topological and geometric information of a manifold. Essentially, a single manifold does not provide enough information in practical applications like feature extraction for machine learning analysis. As such, instead of studying just a single manifold, one could examine the spectra of a family of manifolds by adjusting the filtration parameter. The spectra of the Laplacians from this family of manifolds could provide much more information than by considering just one, as the topology and geometry could change for different parameters. This single-parameter family of manifolds, called the evolution of manifolds, was first introduced in [18] based on tetrahedral meshes. We briefly recap the background.
The formal definition of the evolving manifold is given by a one-parameter family of immersions Fc=F(⋅,c) with F:B×[a,b]→N being a smooth map, where B is called the base manifold, N is the ambient manifold, and c∈[a,b] is a real parameter within the interval. In practice, the most common way to define the evolution of manifolds without specifying B is through a level set function by adjusting the isovalues. Given a function f:N→[a,b], then in our case, we consider the sublevel sets M={x∈N|f(x)≤c} with the boundary given as ∂M={x∈N|f(x)=c} for c∈[a,b]. A sequence of manifolds can then be obtained by considering evenly distributed isovalues of the function f with the inclusion map
M0↪M1↪M2↪⋯↪Ms−1↪Ms, | (4.1) |
where each Ml is given as the sublevel set corresponding to cl with a≤c0<c1<⋯<cs≤b. To ensure that Ml is a manifold, we assume that the function f is a Morse function on N, and none of the cl's corresponds to a critical value of the function f, i.e., f−1(cl) does not contain any critical points. This is always possible as the set of Morse functions on a compact manifold is dense in the space of smooth functions, and their critical points are isolated, nondegenerate, and finite for compact manifolds. Thus we can always perturb any input function slightly to avoid critical isovalues in {cl,l=0,1,⋯s}. In addition, we assume that for each l, Ml,l+1=¯Ml+1∖Ml={x∈N|f(x)∈[cl,cl+1]} contains at most one critical point, which can be realized by refining the parameter sequence. Note that both Ml and Ml,l+1 are compact. By Morse theory, if Ml,l+1 contains no critical points, Ml is diffeomorphic to Ml+1. The retraction from Ml+1 to Ml can be easily constructed by considering a flow along the gradient of the function. As Ml+1 is homotopic to Ml in this case, there is no topological change happening between (cl,cl+1). For the other case when there is exactly one critical point in Ml,l+1, the manifold Ml+1 is homotopic to Ml with a k-cell attached, where k is the index of the critical point, defined to be the dimension of the largest subspace on which the Hessian Hess(f)(x) is negative definite. The topological change of the sublevel sets occurs precisely at the critical values of the level set function. Depending on the type of the critical points, i.e., local minimum, saddle points, and local maximum, the topology changes in different ways. In general, a local maximum has the full index m, a local minimum has index 0, while saddle points have indices strictly between 0 and m. In the case of R3, the occurrences of minima and maxima correspond to the birth of the 0-th generators and the death of the 2nd homology generators, respectively, while the occurrences of 1-saddle points correspond to the birth of 1st homology generators or the death of the 0-th homology generators, and those of 2-saddle points correspond to the birth of 2nd homology generators or the death of 1st homology generators.
As the de Rham complex depends on the topology, it can also be extended to the filtration of manifolds. Due to the duality of the normal and tangential boundary conditions, without loss of generality, one may focus on the space of normal differential forms. Given Ml↪Ml+1, we then need to construct a map from the space of normal k-forms Ωkn(Ml) to the space of normal k-forms Ωkn(Ml+1), which extends each normal k-form on Ml to a normal k-form on Ml+1. Let ω∈Ωkn(Ml). The idea is to utilize the boundary condition of ω on Ml and extend the forms ω|∂Ml to exact normal forms on the domain Ml,l+1 with certain boundary conditions on ∂Ml,l+1=∂Ml∪∂Ml+1. Then, the combination ¯ω defines a normal k-form on the manifold Ml+1. Note however that δ¯ω is only 0 in Ml, so the extension of ω∈kerδ may no longer be in kerδ on Ml+1.
To be specific, we consider the biharmonic equation Δ2ζ=Δ(Δζ)=0 on Ml,l+1 with both Dirichlet and Neumann boundary conditions to ensure the smoothness of dζ with ω through ∂Ml. Note that dζ satisfies the normal boundary condition on Ml,l+1. Let ¯ω be the extension of ω on Ml+1 with ¯ω=ω on Ml and ¯ω=dζ on Ml,l+1. It follows that ¯ω∈Ωkn(Ml+1) as it satisfies the normal boundary condition ¯ω|∂Ml+1=ζ|∂Ml+1=0.
While the biharmonic equation produces a smooth extension, in practice, it is more efficient to consider the harmonic extension with the boundary condition Δζ=0 with the boundary condition of ⋆dζ=⋆ω on ∂Ml and the typical normal form boundary condition on ∂Ml+1. The solution, by [53], Theorem 3.4.10], is unique. The resulting ˉω is continuous but non-smooth as δˉω may lead to a Dirac distribution on ∂Ml when Ml,l+1 induces a topological change. For instance, for a harmonic normal 1-form ω on a spherical shell Ml with Ml+1 turning into a solid ball, the biharmonic extension would create a uniform divergence δˉω in Ml,l+1, whereas the harmonic extension creates a thin layer of nonzero divergence δˉω near the part of ∂Ml around the cavity in the middle. Thus, the harmonic extension serves the same purpose in reducing the kernel of δ.
Denote by Il,1 the map from Ωkn(Ml) to Ωkn(Ml+1) sending ω to ¯ω. Note that (d∘Il,1)(ω) is 0 on Ml,l+1 and thus the same as the extension of the differential of a normal form dω on Ml,l+1, i.e., d∘Il,1=Il,1∘d. It follows that there is a commutative diagram
![]() |
where the horizontal direction gives the de Rham complex and the vertical direction shows the filtration-induced extensions.
Next, we introduce the p-persistent Hodge Laplacian. Let Il,p=Il+p−1,1∘...∘Il,1, which then gives an extension map from the space of normal forms on Ml to the space of normal forms on Ml+p. We have the following commutative diagram:
![]() |
Here dl,δl denotes the differential and codifferential on Ωk(Ml), dl+p,δl+p denotes the differential and codifferential on Ωk(Ml+p), respectively, and Rl,p is the projection of differential forms in Ωkn(Ml+p) to the space spanned by the harmonic extensions followed by the restriction to Ml. Let ˜δl,p=δl+p∘Il,p and ˜dl,p=Rl,p∘dl+p. By the construction of the extension, we have (˜δl,pω,η)=(ω,˜dl,pη), i.e., ˜δl,p are adjoint to ˜dl,p. We then define the p-persistent Hodge Laplacian operator Δpn,l:Ωkn(Ml)→Ωkn(Ml) as follows:
Δpn,l=˜dl,p˜δl,p+δldl. | (4.2) |
It is easy to see that when p=0, the p-persistent Hodge Laplacian gives exactly the usual Hodge Laplacian Δn,l:Ωkn(Ml)→Ωkn(Ml) restricted to the space of normal forms. We then define the p-persistent normal harmonic fields as the kernel of the p-persistent Hodge Laplacian Hk,pn=kerΔpn,l, which can be identified with the space ker˜δl,p∩kerdl. Note that by the extension construction and Rl,p∘Il,p=Id, one can see that ker˜δl,p⊂kerδ gets smaller as p increases, which confirms that fewer cohomology generators persist longer.
The regular Cartesian grid allows one to define persistent graph Laplacian on manifolds in the same way as persistent graph Laplacian [61]. It also allows defining persistent Hodge Laplacian in a consistent way, with the inclusion of nontrivial Hodge stars.
Recall that the discrete differential k-forms can be seen as a k-co-chain, i.e., a linear mapping from the chain space Ck to R that sends a k-chain ck=∑iaiσi to ∫ckω=∑iaiWi, where Wi=∫σiω is the integral of a smooth k-form ω over the k-cell σi.
By varying the isovalue of the level set function f, we can get a sequence of cell complexes given as nested sequences of sub-cell complexes of K satisfying the normal boundary conditions.
∅=K0⊂K1⊂⋯⊂Ks−1⊂Ks=K. | (4.3) |
See Figure 4 for an example of such a nested sequence of sub-cell complexes in a 2D Cartesian grid. Denote by Ck(Kl) the space of discrete k-forms on sub-complex Kl with 0≤l≤s. Note that Kl⊂Kl+1. A discrete k-form on Kl can be easily extended to Kl+1 by solving the discrete Laplace equation with the above boundary conditions for values on every k-cells in Kl,l+1=Cl(Kl+1∖Kl), the closure of the difference complex. We denote this extension map as Il,1:Ck(Kl)→Ck(Kl+1) and by Il,p=Il+p−1,1∘Il+p−2,1∘⋯∘Il,1:Ck(Kl)→Ck(Kl+p) the extension mapping from the space of discrete k-forms on Kl to the space of discrete k-forms on Kl+p, which may also be constructed directly by solving the Laplace equation on Kl,l+p=Cl(Kl+p∖Kl). With this extension mapping, the space of discrete k-forms on Kl can be seen as a subspace of discrete k-forms on Kl+p.
A sequence of the discrete de Rham co-chain complexes can be defined as follows:
![]() |
where Dkl:Ck+1(Kl)→Ck(Kl) denotes the discrete differential operator, and δkl:Ck(Kl)→Ck−1(Kl) denotes the discrete codifferential operator on Kl.
To define the persistent discrete Hodge Laplacian, we construct the discrete counterparts of ˜dl,p and ˜δl,p in the previous section.
Denote by δk+1,nl,p:Ck+1(Kl)→Ckl,p the operator given as δk,nl,p=δkl+pIk,nl,p, where δk,nl+p is the previously defined discrete operator for Kl+p and Ik,nl,p is the discrete harmonic extension operator defined next. Assuming Kl,l+p contains few k-cells, the harmonic extension is then constructed by the linear system Lk−1,nKl,l+pζ=0, and shifting all ⋆dζ values in the overlap of supports of Kl and Kl,l+p to the righthand side and replacing them with a rescaling of ⋆ω based on the k-volume within each support. More specifically, the resulting system is ˜Lk−1,nKl,l+p˜ζ=−Sk−1,nδk∂Klω, where ˜Lk−1,nKl,l+p is the Laplace operator applied to a form ˜ζ defined on Kl,l+p∖∂Kl, and δk∂Kl is the boundary codifferential operator that uses the values of ω on ∂Kl to evaluate the neighboring (k−1)-cells in Kl,l+p∖∂Kl.
The resulting extension operator
Il,p=(IdKl−DkKl,l+p(˜Lk,nKl,l+p)−1Sk,nδk∂Kl), |
where IdKl is the identity matrix in Kl up to a rescaling in the boundary, provides the combination of ω in Kl and d˜ζ in Kl,l+p∖∂Kl, when applied to ω. The matrix corresponding to Il,p is dense for rows corresponding to cells in Kl,l+p but diagonal for rows corresponding to cells in Kl. Note that δ∂Kl is not necessarily 0 for co-closed ω, but is 0 for co-exact ω.
The adjoint operator of δk+1,nl,p defines Dkl,p. In the following, we drop most of the subscripts for clarity. Recall that (ω,˜dη)=(˜δω,η) can be discretized as
[W]TS[˜DE]=[S−1DTSIl,pW]TS[E] |
with W and E as discrete versions of ω and η. Thus, ˜D=S−1ITl,pSD, from which we may recognize the restriction operator as R=S−1ITl,pS. This restriction operator can be seen as the L2-projection onto the space formed by all harmonic extensions from ΩKn(Ml).
Note that in this case, immediately δkl,pδk+1l=0 since the extension operator will generate ˜ζ=0 for any co-exact form ω=δβ on Kl as the righthand side of the associated linear system essentially corresponds to δδβ=0. From the adjoint version, we have DklDk−1l,p=0, and thus the following commutative diagram
![]() |
The discrete p-persistent Hodge Laplacian is then given as follows:
Lkl,p=Dk−1l,pδkl,p+δk+1lDkl, | (4.4) |
and the discrete p-persistent BIG Laplacian is
Lkl,p=Dk−1l,p(Dk−1l,p)T+(Dkl)TDkl. | (4.5) |
We now present some examples of evolving manifolds and show results for the spectral analysis of their persistent Laplacians. In particular, we focus on the changes of Betti numbers β0, β1, and β2 and the first nonzero eigenvalues λT1,λC1 and λN1 of the 0-persistent BIG Laplacians in the set T, C, and N, respectively, as introduced in Section 3.3. Four models are considered, including the Bimba model, the kitten model, a genus-3 model, and a four-ball model. For each model, we show on the top row snapshots of evolving manifolds at five evenly spaced isovalues in a chosen interval, and on the bottom row the changes in Betti numbers and the first nonzero eigenvalues λT1,λC1, and λN1. All the evolving manifolds are generated using isovalues of the signed distance function (SDF) from the original surface model, given as the 0-isosurface of the SDF. As we show below, these values from the evolution of manifolds provide rich information rather than considering just a single manifold. The discontinuity of these variables indicates the topological changes occurring during the evolution process, and the monotonicity of these nonzero eigenvalues reveals the geometric changes.
The results for the Bimba model are presented in Figure 5 with an isovalue interval [0,0.2]. As there is no topological change happening in the evolution process, all Betti numbers β0, β1, and β2 remain constant, and λT1,λC1, and λN1 are continuous throughout the whole process. Both λC1 and λN1 decrease as the isovalue increases.
Figure 6 illustrates the results for the kitten mode with one tunnel formed by its tail. The isovalue interval [0,8] is considered. One can see all variables are continuous during the evolution process except that β1 and λC1 both drop at the same isovalue, where β1 changes from 1 to 0. This happens due to the disappearance of the tunnel. In addition, λT1 increases at the beginning, and then slows down its rate of increase at the isovalue after the tunnel disappears, and λC1 and λN1 decrease during the evolution process.
Note that there are also tunnels in the evolving manifolds for the genes-3 model, as we expected, and a similar phenomenon can also be observed in Figure 7 for the change of the Betti numbers and the first nonzero eigenvalues. The isovalue interval [0.1,4] is considered for this model. The disappearance of the three tunnels leads to a drop of β1 from 3 to 0 and also a drop of λC1. λT1 initially increases, and then changes its behavior to decrease after the tunnels vanish. The evolution process results in a decrease in λC1 and λN1, just as in the previous two models.
The evolving process of the four-ball model with isovalue interval [2,3.84] (see Figure 8) leads to discontinuities in all Betti numbers and the first nonzero eigenvalues. As the four separate components merge in the evolution, β0 changes from 4 to 1, along with a drop in λT1 at the same isovalue. In addition, β1 increases from 0 to 3 due to the appearance of three tunnels when the merge happens and then decreases to 0 after the disappearance of all tunnels. The nonzero eigenvalue λC1 has a drop that occurs when the tunnel vanishes, however, it is continuous when the tunnels are formed. This suggests that the continuity of λC1 is only related to the death but not the birth of tunnels. One can also observe a slowdown in the rate of change of λT1 following the disappearance of all tunnels. As the isolate increases further, a cavity occurs in the manifold, resulting in an increase of β2 from 0 to 1 and finally a decrease from 1 to 0 after the cavity disappears. This topological change can also be observed in λN1, where λN1 becomes non-differentiable.
As illustrated by these models, changes in Betti numbers β0, β1, and β2 and the first nonzero eigenvalues λT1,λC1, and λN1 not only reflect the changes in topology, but also characterizes the changes in geometry for the evolution of manifolds. The rich information revealed by these variables leads to potential applications in various topological data analysis tasks.
In this section, we carry out a proof-of-principle experimental demonstration of the proposed persistent de Rham-Hodge theory-based MTL. In this approach, the problem is defined on manifolds with boundaries. Appropriate boundary conditions are implemented to match actual topological dimensions. The resulting persistent Hodge Laplacians are solved to deliver the corresponding series of eigenvectors and eigenvalues at various scales. In this approach, we use these eigenvalues for machine learning predictions of protein-ligand binding affinity. The binding affinity describes the strength of protein-ligand interactions for each protein-ligand complex and has been a popular subject in machine learning studies [6,66]. The feature extraction algorithm, i.e., the algorithm for computing eigenvalues of Laplacians, is implemented in MATLAB, while the machine learning model is employed using a gradient boosting regressor (GBR) module from Scikit-learn.
We consider two benchmark datasets, PDBbind-v2007 and PDBbind-v2016 [38,51], to demonstrate the effectiveness of our framework in capturing the topological features of protein-ligand complexes. The datasets can be downloaded from http://pdbbind.org.cn/. These two PDBbind datasets provide collections of biomolecular complexes in PDB with experimentally a measured binding affinity for each protein-ligand complex, and are commonly used in various studies such as drug-discovery or molecular recognition, etc [10,38]. We aim to build a machine learning model, by utilizing the topological and geometric features of the protein-ligand complexes generated using our persistent Hodge Laplacian (PHL) framework as inputs, for predicting the protein-ligand binding affinities.
The biomolecular complexes in each PDBbind dataset are organized into three sets, including a general set, a refined set and a core set, with each set being a superset of the next [7,24,36,41,57]. In our experiments, for each dataset, we use the refined set, excluding the core set, to train the predictive model for the binding affinities of the protein-ligand complexes in the core set. The PDBbind-v2007 dataset contains a total of 1,300 complexes with 1,090 in the refined set and 195 in the core set, while the PDBbind-v2016 dataset has a total of 4,057 complexes with 3,767 in the refined set and 290 in the PDBbind core set.
The original datasets contain atomic names and coordinates, which are the so-called point cloud data. To generate manifold representations, we carry out the discrete to continuum mapping using the flexibility and rigidity index [47]. To compute the topological feature of each protein-ligand complex for the machine learning model, we use the element-specific approach [9]. Specifically, we consider the pairwise interactions between element types that are commonly found in proteins and ligands, including Hydrogen (H), Carbon (C), Nitrogen (N), Oxygen (O), and Sulfur (S) in proteins, and Hydrogen (H), Carbon (C), Nitrogen (N), Oxygen (O), Sulfur (S), Phosphorus (P), Fluorine (F), Chlorine (Cl), Bromine (Br), and Iodine (I) in ligands. These interactions result in a total of 50 pairs of atom types for each protein-ligand complex [9]. However, due to the absence of H in most proteins, we reduce the number of atom pairs to 40 in practice, ignoring the element H in all proteins. These 40 atom pairs, formed by atom types {C, N, O, S} in proteins, and atom types {H, C, N, O, S, P, F, Cl, Br, I} in ligands, along with their xyz coordinates, are used to generate the topological features for each protein-ligand complex. In this paper, all atom-pair complexes are determined by a cutoff distance 12˚A from the ligand.
Let {xαi,i=1,⋯,s} be the location coordinates of all s atoms in an atom pair, where α denotes the atom type of the atom either in the protein or in the ligand. For this atom pair, a level set function can then be obtained by considering the negative sum of Gaussian density functions defined at the xyz coordinates of all atoms, given as
ρ(x,τ)=−s∑i=1exp(−(||x−xαi||τrαi)2), | (5.1) |
where ||x−xαi|| is the Euclidean distance from position x to the location xαi of the i-th atom, τ is a scalar value, and rαi is the van der Waals radius of the i-th atom, determined by the atom type α. Given an isovalue c, the sublevel set
M={x|ρ(x,τ)≤c} | (5.2) |
defines a compact manifold in R3 with its boundary given by the isosurface ∂M={x|ρ(x,τ)=c}. A filtration of a manifold for the atom pair can then be obtained by choosing a list of evenly spaced isovalues of this level set function (5.1). Let c1<c2<⋯<cs be such isovalues. We have their corresponding sublevel sets given as follows:
M1⊂M2⊂⋯⊂Ms, | (5.3) |
where Mi is the compact manifold associated to isovalue ci. In Figure 9, we present one example of the resulting filtration of manifolds at 3 different isovalues for atom pair OH in protein-ligand complex 4tmn. Note that the function (5.1) is a special case of the flexibility rigidity index (FRI) density function [47], which has been shown computationally stable in converting discrete point cloud representations to continuous embeddings, and been used for generating protein boundary surfaces [18] and interactive manifolds [47]. Therefore, one can also make other reasonable choices of FRI density functions to generate the filtration of manifolds.
In the computation of the Laplacians, one can ideally choose a common Cartesian grid such that it contains all manifolds of interest for all protein-ligand complexes, which ensures that all Laplacians are computed consistently, making their spectra comparable for different complexes. However, as atoms are spread out in the space for different atom pairs, we need to use a sufficiently large grid with a fine resolution for accurate computation of Laplacians, which significantly increases the computational load. Instead, we consider, for each type of atom pairs, a fixed Cartesian grid, regardless of the types of protein-ligand complexes. This approach also ensures that the topological features are comparable for different protein-ligand complexes, as all spectra are computed in the same grid for all atom pairs of the same type. For simplicity, we choose a fixed grid spacing for all Cartesian grids across different atom pairs, and in our experiments, the grid spacing is fixed to be 0.549˚A for both PDBbind datasets.
We consider 9 evenly spaced isovalues in the interval [−0.5,−0.001] for all level set functions, which provide 9 compact manifolds for each atom pair. Note that the level set function (5.1) is always less than 0 and approaches 0 as the norm of x increases. This interval is chosen as isovalues greater than −0.001 result in no change on the 0-th Betti number β0 of manifolds for most atom pairs, and isovalues smaller than −0.5 leads to high computational cost, as finer grids are necessary to resolve those isosurfaces. To ensure that the computation of Laplacians is accurate and no topological information is missing due to numerical errors caused by low resolution, we require that at least 8 grid cells of the Cartesian grid are contained in each connected component of a manifold. We compute, for each manifold, the BIG Laplacian L3,n under the normal boundary condition, for which the number of its 0 eigenvalues gives the 0-th Betti number β0. We then use the 0-th Betti number β0 and the first k nonzero eigenvalues of L3,n, as the topological feature for the manifold. These k+1 features for each of the 9 compact manifolds for each atom pair, amount to (k+1)×9×40 topological features for each protein-ligand complex. While we only used 9 isovalues within this interval for generating the manifolds in our experiments, more isovalues can be considered, which gives a filtration of more manifolds for each atom pair, and finally leads to more topological features for each protein-ligand complex.
The spectra of the 0-th Laplacian, which in our case corresponds to L3,n under the normal boundary condition, have proven effective and successful in many machine learning tasks [10,38,41,61]. While the Laplacians of other orders could also be used for generating more topological features, we utilize, in this preliminary test, only the spectra of L3,n as features for the protein-ligand complexes in the machine learning model due to the computation efficiency. The results, as shown in Section 5.4, indicate that these features are sufficient to validate our framework in the machine learning task for predicting the protein-ligand binding affinities.
As a subset of artificial intelligence (AI), machine learning focuses on developing algorithms and models that allow computers to learn from and make decisions based on data. Instead of being explicitly programmed to perform a task, machine learning models identify patterns in data, enabling them to improve their performance over time. New algorithms have been constantly proposed [34,35], including the topology-enabled transformer [12]. The machine learning models for predicting protein-ligand binding affinities often fall into two categories depending on the type of input data: complex-based or sequence-based models. The complex-based methods are trained using features obtained from the 3D protein-ligand complexes, while the sequence-based models learn from the one-dimensional protein sequences and the ligand-simplified molecular-input line-entry system (SMILES) strings. In our experiments, besides the topological features from the 3D protein-ligand complexes, we incorporate protein-ligand features obtained from sequence-based models to build consensus models. To be specific, we make use of the recent pretrained transformer protein language model Evolutionary Scale Modeling-2 (ESM-2) [32], and the pretrained Transformer-CPZ model [13] for generating the protein and ligand features, respectively, and use their concatenation as inputs for the binding affinity prediction. Here CPZ represents the union of three data setscontains ChEMBL [26], PubChem [31], and ZINC [29]. The residue embeddings from the last layer of the pretrained ESM-2 model esm.pretrained.esm2_t33_650M_UR50D are used as the protein features, while the embeddings from the last layer of the pretrained Transformer-CPZ model chembl27_pubchem_zinc_512 are used as the ligand features.
With the topological features and the embedding features obtained from ESM-2 and Transformer-CPZ, we employ the GBR module from Scikit-learn 1.4.2 for predicting the protein-ligand binding affinities. We then use the consensus prediction from these models as the final results. The GBR parameters used in our experiments are: n_estimators = 10,000, max_depth = 5, min_samples_split = 5, learning_rate = 0.005, loss = squared_error, subsample = 0.5, and max_features = sqrt. Changing these parameters does not result in significant differences. To address the randomness from the machine learning algorithm, we repeat each modeling process 20 times with different random seeds, and use the average of predictive results. The Pearson correlation coefficients (PCC) are used as the evaluation metric to assess the performance of our proposed models.
The number of topological features for each protein-ligand complex, as in Section 5.2, is given by (1+k)×9×40, where k denotes the number of the first k nonzero eigenvalues of the Laplacians. To find the optimal parameter k leading to the best performance of predictive modules, we carry out the five-fold cross-validation on the training set of each PDBbind dataset with varying values of k based on the average of PCC values. The results indicate that the optimal PCC values for the PDBbind-v2007 and PDBbind-v2016 training sets can be achieved when k=5 and k=10, respectively. For the PDBbind-v2007 training set, the PCC value is 0.709 and the root mean squared error (RMSE) value is 2.049, while for the PDBbind-v2016 training set, the PCC value is 0.748 and the RMSE value of 1.812. These choices of k result in a total of 2,160 topological features for each protein-ligand complex in the PDBbind-v2007 dataset and 3,960 topological features for each protein-ligand complex in the PDBbind-v2016 dataset. These topological features, along with the concatenated protein-ligand features from ESM-2 and Transformer-CPZ, are then used as inputs of the gradient-boosting regressor for binding affinity prediction.
In Table 1, we report the average PCC values and the average RMSE of our models on the test set for each PDBbind dataset using only the topological features from PHL, the model using only the transformer features (TF), and the consensus module using both types of features. With the incorporation of topological features, one can see a significant improvement in PCC values when using the proposed consensus model for each dataset, compared to the model using only TF features. The best performance is achieved when using the consensus model, yielding a PCC value of 0.826 with RMSE given as 1.954 for PDBbind-v2007 and 0.849 with RMSE 1.728 for PDBbind-v2016. In addition, we present the Pearson correlation coefficients obtained from our model and those in the previous studies, with results from [7,10,36,41,57]. As illustrated in Figure 10, our model outperforms all the other models for the two PDBbind datasets. These results demonstrate the utility and effectiveness of our method in capturing the topological features.
Method | PCC | RMSE (kcal/mol) | |
PDBbind-v2007 | PHL | 0.794 | 2.066 |
TF | 0.795 | 2.006 | |
Consensus | 0.826 | 1.954 | |
PDBbind-v2016 | PHL | 0.808 | 1.863 |
TF | 0.836 | 1.716 | |
Consensus | 0.849 | 1.728 |
Although there has been tremendous success of topological data analysis TDA [44,45], particularly, topological deep learning TDL on point cloud data [9,49], there are few methods for the topological analysis of data on manifolds or manifold topological analysis [19]. To fill this gap, we presented a new method, persistent Hodge Laplacian PHL in the Eulerian representation, for manifold topological learning MTL of real-world data on manifolds. PHL differs from existing state-of-the-art TDA methods on point clouds in the sense that the proposed PHL is defined on manifolds, for which the traditional TDA methods do not work. Additionally, PHL extends our earlier evolutionary de Rham-Hodge theory constructed on the Lagrangian representation [18] to the Eulerian representation, which avoids numerical inconsistency over multi-scale manifolds. We offer two discrete Hodge stars that mimic the continuous operator and developed both a continuous theory for mapping of normal forms across manifolds in a filtration to enable persistent cohomology analysis and the associated topology-persevering discrete construction on Cartesian grids. A proof-of-principle test on two benchmark datasets validates our MTL model, highlighting its simplicity and promise for the predictions of data on manifolds.
The popularity of TDA is facilitated by effective software packages, such as JavaPlex [2], Perseus [42], Ripser [4], etc. The further development of efficient PHL software is an important task. The computational efficiency has not been studied in this work. Algorithm acceleration and parallel and graphics processing unit (GPU) architecture are to be explored. Further experimental validations of manifold topological learning are also needed.
Zhe Su: designed the project, wrote the code and draft, analyzed data, and revised the manuscript; Yiying Tong: conceptualized and designed the project, analyzed data, wrote the draft, revised the manuscript, and acquired funding; Guo-Wei Wei: conceptualized and supervised the project, wrote the draft, revised the manuscript, and acquired funding. All authors have read and approved the final version of the manuscript for publication.
This work was supported in part by NIH grants R01GM126189, R01AI164266, and R35GM148196, National Science Foundation grants DMS2052983, DMS-1761320, and IIS-1900473, NASA grant 80NSSC21M0023, Michigan State University Research Foundation, and Bristol-Myers Squibb 65109. The authors thank Dr. Hongsong Feng for his help on transformer embeddings.
The authors declare no competing interests.
[1] | H. Adams, T. Emerson, M. Kirby, R. Neville, C. Peterson, P. Shipman, et al., Persistence images: A stable vector representation of persistent homology, J. Mach. Learn. Res., 18 (2017), 1–35. |
[2] | H, Adams, A. Tausz, M. Vejdemo-Johansson, JavaPlex: A research software package for persistent (co) homology, Mathematical Software–ICMS 2014, Seoul, South Korea, 2014,129–136. https://doi.org/10.1007/978-3-662-44199-2_23 |
[3] |
D. N. Arnold, R. S. Falk, R. Winther, Finite element exterior calculus, homological techniques, and applications, Acta Numer., 15 (2006), 1–155. https://doi.org/10.1017/S0962492906210018 doi: 10.1017/S0962492906210018
![]() |
[4] |
U. Bauer, Ripser: efficient computation of vietoris-rips persistence barcodes, J. Appl. Comput. Topology, 5 (2021), 391–423. https://doi.org/10.1007/s41468-021-00071-5 doi: 10.1007/s41468-021-00071-5
![]() |
[5] | P. Bubenik, Statistical topological data analysis using persistence landscapes, J. Mach. Learn. Res., 16 (2015), 77–102. |
[6] |
H. Cai, C. Shen, T. Y. Jian, X. J. Zhang, T. Chen, X. Q. Han, et al., Carsidock: A deep learning paradigm for accurate protein-ligand docking and screening based on large-scale pre-training, Chem. Sci., 15 (2024), 1449–1471. https://doi.org/10.1039/D3SC05552C doi: 10.1039/D3SC05552C
![]() |
[7] |
Z. X. Cang, L. Mu, G.-W. Wei, Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening, PLoS Comput. Biol., 14 (2018), e1005929. https://doi.org/10.1371/journal.pcbi.1005929 doi: 10.1371/journal.pcbi.1005929
![]() |
[8] |
Z. X. Cang, L. Mu, K. D. Wu, K. Opron, K. Xia, G.-W. Wei, A topological approach for protein classification, Computational and Mathematical Biophysics, 3 (2015), 140–162. https://doi.org/10.1515/mlbmb-2015-0009 doi: 10.1515/mlbmb-2015-0009
![]() |
[9] |
Z. X. Cang, G.-W. Wei, Topologynet: Topology based deep convolutional and multi-task neural networks for biomolecular property predictions, PLoS Comput. Biol., 13 (2017), e1005690. https://doi.org/10.1371/journal.pcbi.1005690 doi: 10.1371/journal.pcbi.1005690
![]() |
[10] |
Z. X. Cang, G.-W. Wei, Integration of element specific persistent homology and machine learning for protein-ligand binding affinity prediction, Int. J. Numer. Meth. Bio., 34 (2018), e2914. https://doi.org/10.1002/cnm.2914 doi: 10.1002/cnm.2914
![]() |
[11] | G. Carlsson, Topology and data, B. Am. Math. Soc., 46 (2009), 255–308. |
[12] |
D. Chen, J. Liu, G.-W. Wei, multi-scale topology-enabled structure-to-sequence transformer for protein-ligand interaction predictions, Nat. Mach. Intell., 6 (2024), 799–810. https://doi.org/10.1038/s42256-024-00855-1 doi: 10.1038/s42256-024-00855-1
![]() |
[13] |
D. Chen, J. X. Zheng, G.-W. Wei, F. Pan, Extracting predictive representations from hundreds of millions of molecules, J. Phys. Chem. Lett., 12 (2021), 10793–10801. https://doi.org/10.1021/acs.jpclett.1c03058 doi: 10.1021/acs.jpclett.1c03058
![]() |
[14] |
H. Chen, Y. Zhang, W. H. Zhang, P. X. Liao, K. Li, J. L. Zhou, et al., Low-dose CT via convolutional neural network, Biomed. Opt. Express, 8 (2017), 679–694. https://doi.org/10.1364/BOE.8.000679 doi: 10.1364/BOE.8.000679
![]() |
[15] |
J. H. Chen, Y. C. Qiu, R. Wang, G.-W. Wei, Persistent laplacian projected omicron ba. 4 and ba. 5 to become new dominating variants, Comput. Biol. Med., 151 (2022), 106262. https://doi.org/10.1016/j.compbiomed.2022.106262 doi: 10.1016/j.compbiomed.2022.106262
![]() |
[16] |
J. H. Chen, R. Wang, M. L. Wang, G.-W. Wei, Mutations strengthened SARS-CoV-2 infectivity, J. Mol. Biol., 432 (2020), 5212–5226. https://doi.org/10.1016/j.jmb.2020.07.009 doi: 10.1016/j.jmb.2020.07.009
![]() |
[17] |
J. H. Chen, G.-W. Wei, Omicron BA. 2 (B. 1.1. 529.2): high potential for becoming the next dominant variant, J. Phys. Chem. Lett., 13 (2022), 3840–3849. https://doi.org/10.1021/acs.jpclett.2c00469 doi: 10.1021/acs.jpclett.2c00469
![]() |
[18] |
J. H. Chen, R. D. Zhao, Y. Y. Tong, G.-W. Wei, Evolutionary de rham-hodge method, Discrete Cont. Dyn-B, 26 (2021), 3785–3821. https://doi.org/10.3934/dcdsb.2020257 doi: 10.3934/dcdsb.2020257
![]() |
[19] | M. Desbrun, E. Kanso, Y. Y. Tong, Discrete differential forms for computational modeling, In: ACM SIGGRAPH 2006 Courses, New York: Association for Computing Machinery, 2006, 39–54. https://doi.org/10.1145/1185657.1185665 |
[20] | T. K. Dey, F. T. Fan, Y. S. Wang, Computing topological persistence for simplicial maps, In: Proceedings of the thirtieth annual symposium on Computational geometry, New York: Association for Computing Machinery, 2014, 345–354. https://doi.org/10.1145/2582112.2582165 |
[21] |
J. Dodziuk, Finite-difference approach to the hodge theory of harmonic forms, Am. J. Math., 98 (1976), 79–104. https://doi.org/10.2307/2373615 doi: 10.2307/2373615
![]() |
[22] | R. Dong, A faster algorithm of up persistent laplacian over non-branching simplicial complexes, 2024, arXiv: 2408.16741. https://doi.org/10.48550/arXiv.2408.16741 |
[23] | H. Edelsbrunner, J. Harer, Persistent homology-a survey, In: Surveys on discrete and computational geometry: twenty years later, Singapore: Contemporary Mathematics, 2008, 257–282. https://doi.org/10.1090/conm/453/08802 |
[24] |
P. G. Francoeur, T. Masuda, J. Sunseri, A. Jia, R. B. Iovanisci, I. Snyder, et al., Three-dimensional convolutional neural networks and a cross-docked data set for structure-based drug design, J. Chem. Inf. Model., 60 (2020), 4200–4215. https://doi.org/10.1021/acs.jcim.0c00411 doi: 10.1021/acs.jcim.0c00411
![]() |
[25] |
K. O. Friedrichs, Differential forms on riemannian manifolds, Commun. Pur. Appl. Math., 8 (1955), 551–590. https://doi.org/10.1002/cpa.3160080408 doi: 10.1002/cpa.3160080408
![]() |
[26] |
A. Gaulton, A. Hersey, M. Nowotka, A.P. Bento, J. Chambers, D. Mendez, P. Mutowo, F. Atkinson, L.J. Bellis, E. Cibrián-Uhalte, M. Davies, The ChEMBL database in 2017, Nucleic acids research, 45 (2017), D945–D954. https://doi.org/10.1093/nar/gkw1074 doi: 10.1093/nar/gkw1074
![]() |
[27] |
R. Ghrist, Barcodes: The persistent topology of data, B. Am. Math. Soc., 45 (2008), 61–75. https://doi.org/10.1090/S0273-0979-07-01191-3 doi: 10.1090/S0273-0979-07-01191-3
![]() |
[28] | A. B. Gülen, F. Mémoli, Z. C. Wan, Y. S. Wang, A generalization of the persistent laplacian to simplicial maps, The 39th International Symposium on Computational Geometry (SoCG 2023), Schloss Dagstuhl–Leibniz-Zentrum für Informatik, 2023, 37: 1–37: 17. https://doi.org/10.4230/LIPIcs.SoCG.2023.37 |
[29] |
J. Irwin and B. Shoichet, ZINC - a free database of commercially available compounds for virtual screening, Journal of chemical information and modeling, 45 (2005), 177–182. https://doi.org/10.1021/ci049714+ doi: 10.1021/ci049714+
![]() |
[30] |
M. Khovanov, A categorification of the jones polynomial, Duke Math. J., 101 (2000), 359–426. https://doi.org/10.1215/S0012-7094-00-10131-7 doi: 10.1215/S0012-7094-00-10131-7
![]() |
[31] |
S. Kim, P.A. Thiessen, E.E. Bolton, J. Chen, G. Fu, A. Gindulyte, L. Han, J. He, S. He, B.A. Shoemaker, J. Wang, PubChem substance and compound databases, Nucleic acids research, 44 (2016), D1202–D1213. https://doi.org/10.1093/nar/gkv951 doi: 10.1093/nar/gkv951
![]() |
[32] |
Z. M. Lin, H. Akin, R. Rao, B. Hie, Z. K. Zhu, W. T. Lu, et al., Language models of protein sequences at the scale of evolution enable accurate structure prediction, BioRxiv, 2022 (2022), 500902. https://doi.org/10.1101/2022.07.20.500902 doi: 10.1101/2022.07.20.500902
![]() |
[33] | J. Liu, J. Y. Li, J. Wu, The algebraic stability for persistent laplacians, 2023, arXiv: 2302.03902. https://doi.org/10.48550/arXiv.2302.03902 |
[34] |
J.-B. Liu, X. Wang, J. D. Cao, The coherence and properties analysis of balanced 2p-ary tree networks, IEEE T. Netw. Sci. Eng., 11 (2024), 4719–4728. https://doi.org/10.1109/TNSE.2024.3395710 doi: 10.1109/TNSE.2024.3395710
![]() |
[35] |
J.-B. Liu, X. Zhang, J. D. Cao, L. P. Chen, Mean first-passage time and robustness of complex cellular mobile communication network, IEEE T. Netw. Sci. Eng., 11 (2024), 3066–3076. https://doi.org/10.1109/TNSE.2024.3358369 doi: 10.1109/TNSE.2024.3358369
![]() |
[36] |
R. Liu, X. Liu, J. Wu, Persistent path-spectral (PPS) based machine learning for protein-ligand binding affinity prediction, J. Chem. Inf. Model., 63 (2023), 1066–1075. https://doi.org/10.1021/acs.jcim.2c01251 doi: 10.1021/acs.jcim.2c01251
![]() |
[37] |
X. Liu, H. T. Feng, J. Wu, K. L. Xia, Persistent spectral hypergraph based machine learning (PSH-ML) for protein-ligand binding affinity prediction, Brief. Bioinform., 22 (2021), bbab127. https://doi.org/10.1093/bib/bbab127 doi: 10.1093/bib/bbab127
![]() |
[38] |
Z. H. Liu, M. Y. Su, L. Han, J. Liu, Q. F. Yang, Y. Li, et al., Forging the basis for developing protein-ligand interaction scoring functions, Acc. Chem. Res., 50 (2017), 302–309. https://doi.org/10.1021/acs.accounts.6b00491 doi: 10.1021/acs.accounts.6b00491
![]() |
[39] |
R. MacPherson, B. Schweinhart, Measuring shape with topology, J. Math. Phys., 53 (2012), 073516. https://doi.org/10.1063/1.4737391 doi: 10.1063/1.4737391
![]() |
[40] |
F. Mémoli, Z. C. Wan, Y. S. Wang, Persistent laplacians: properties, algorithms and implications, SIAM J. Math. Data Sci., 4 (2022), 858–884. https://doi.org/10.1137/21M1435471 doi: 10.1137/21M1435471
![]() |
[41] |
Z. Y. Meng, K. L. Xia, Persistent spectral-based machine learning (perspect ml) for protein-ligand binding affinity prediction, Sci. Adv., 7 (2021), eabc5329. https://doi.org/10.1126/sciadv.abc5329 doi: 10.1126/sciadv.abc5329
![]() |
[42] |
K. Mischaikow, V. Nanda, Morse theory for filtrations and efficient computation of persistent homology, Discrete Comput. Geom., 50 (2013), 330–353. https://doi.org/10.1007/s00454-013-9529-6 doi: 10.1007/s00454-013-9529-6
![]() |
[43] |
C. B. Morrey, A variational method in the theory of harmonic integrals, ii, Am. J. Math., 78 (1956), 137–170. https://doi.org/10.2307/2372488 doi: 10.2307/2372488
![]() |
[44] |
D. D. Nguyen, Z. X. Cang, G.-W. Wei, A review of mathematical representations of biomolecular data, Phys. Chem. Chem. Phys., 22 (2020), 4343–4367. https://doi.org/10.1039/C9CP06554G doi: 10.1039/C9CP06554G
![]() |
[45] |
D. D. Nguyen, Z. X. Cang, K. D. Wu, M. L. Wang, Y. Cao, G.-W. Wei, Mathematical deep learning for pose and binding affinity prediction and ranking in D3R grand challenges, J. Comput. Aided Mol. Des., 33 (2019), 71–82. https://doi.org/10.1007/s10822-018-0146-6 doi: 10.1007/s10822-018-0146-6
![]() |
[46] |
D. D. Nguyen, K. F. Gao, M. L. Wang, G.-W. Wei, MathDL: mathematical deep learning for D3R grand challenge 4, J. Comput. Aided Mol. Des., 34 (2020), 131–147. https://doi.org/10.1007/s10822-019-00237-5 doi: 10.1007/s10822-019-00237-5
![]() |
[47] |
D. D. Nguyen, G.-W. Wei, DG-GL: Differential geometry-based geometric learning of molecular datasets, Int. J. Numer. Meth. Bio., 35 (2019), e3179. https://doi.org/10.1002/cnm.3179 doi: 10.1002/cnm.3179
![]() |
[48] |
E. Panagiotou, K. C. Millett, P. J. Atzberger, Topological methods for polymeric materials: characterizing the relationship between polymer entanglement and viscoelasticity, Polymers, 11 (2019), 437. https://doi.org/10.3390/polym11030437 doi: 10.3390/polym11030437
![]() |
[49] | T. Papamarkou, T. Birdal, M. M. Bronstein. G. E. Carlsson, J. Curry, Y. Gao, et al., Position: Topological Deep Learning is the New Frontier for Relational Learning, The 41st International Conference on Machine Learning, Vienna, Austria, 2024, 39529–39555. |
[50] | C. S. Pun, K. Xia, S. X. Lee, Persistent-homology-based machine learning and its applications–a survey, 2018 arXiv: 1811.00252. https://doi.org/10.48550/arXiv.1811.00252 |
[51] |
M. M. Rana, D. D. Nguyen, Geometric graph learning with extended atom-types features for protein-ligand binding affinity prediction, Comput. Biol. Med., 164 (2023), 107250. https://doi.org/10.1016/j.compbiomed.2023.107250 doi: 10.1016/j.compbiomed.2023.107250
![]() |
[52] |
E. Ribando-Gros, R. Wang, J. H. Chen, Y. Y. Tong, G.-W. Wei, Combinatorial and hodge laplacians: Similarity and difference, SIAM Rev., 66 (2024), 575–601. https://doi.org/10.1137/22M1482299 doi: 10.1137/22M1482299
![]() |
[53] | G. Schwarz, Hodge decomposition–A method for solving boundary value problems, Berlin: Springer, 1995. https://doi.org/10.1007/BFb0095978 |
[54] | L. Shen, H. S. Feng, F. L. Li, F. C. Lei, J. Wu, G.-W. Wei, Knot data analysis using multi-scale gauss link integral, P. Nati. A. Sci., In press, 2024. |
[55] |
L. Shen, J. Liu, G.-W. Wei, Evolutionary khovanov homology, AIMS Mathematics, 9 (2024), 26139–26165. https://doi.org/10.3934/math.20241277 doi: 10.3934/math.20241277
![]() |
[56] | C. Shonkwiler, Poincaré duality angles on Riemannian manifolds with boundary dissertation, University of Pennsylvania, PhD Thesis, University of Pennsylvania, 2009. |
[57] |
M. Y. Su, Q. F. Yang, Y. Du, G. Q. Feng, Z. H. Liu, Y. Li, et al., Comparative assessment of scoring functions: the CASF-2016 update, J. Chem. Inf. Model., 59 (2019), 895–913. https://doi.org/10.1021/acs.jcim.8b00545 doi: 10.1021/acs.jcim.8b00545
![]() |
[58] |
Z. Su, Y. Y. Tong, G.-W. Wei, Hodge decomposition of single-cell RNA velocity, J. Chem. Inf. Model., 64 (2024), 3558–3568. https://doi.org/10.1021/acs.jcim.4c00132 doi: 10.1021/acs.jcim.4c00132
![]() |
[59] |
J. Townsend, C. P. Micucci, J. H. Hymel, V. Maroulas, K. D. Vogiatzis, Representation of molecular structures with persistent homology for machine learning applications in chemistry, Nat. Commun., 11 (2020), 3230. https://doi.org/10.1038/s41467-020-17035-5 doi: 10.1038/s41467-020-17035-5
![]() |
[60] |
R. Wang, J. H. Chen, G.-W. Wei, Mechanisms of SARS-CoV-2 evolution revealing vaccine-resistant mutations in Europe and America, J. Phys. Chem. Lett., 12 (2021), 11850–11857. https://doi.org/10.1021/acs.jpclett.1c03380 doi: 10.1021/acs.jpclett.1c03380
![]() |
[61] |
R. Wang, D. D. Nguyen, G.-W. Wei, Persistent spectral graph, Int. J. Numer. Meth. Bio., 36 (2020), e3376. https://doi.org/10.1002/cnm.3376 doi: 10.1002/cnm.3376
![]() |
[62] |
R. Wang, R. D. Zhao, E. Ribando-Gros, J. H. Chen, Y. Y. Tong, G.-W. Wei, Hermes: Persistent spectral graph software, Found. Data Sci., 3 (2021), 67–97. https://doi.org/10.3934/fods.2021006 doi: 10.3934/fods.2021006
![]() |
[63] |
L. Wasserman, Topological data analysis, Annu. Rev. Stat. Appl., 5 (2018), 501–532. https://doi.org/10.1146/annurev-statistics-031017-100045 doi: 10.1146/annurev-statistics-031017-100045
![]() |
[64] | X. Q. Wei, G.-W. Wei, Persistent topological Laplacians–a survey, 2023, arXiv: 2312.07563. https://doi.org/10.48550/arXiv.2312.07563 |
[65] |
X. Q. Wei, G.-W. Wei, Persistent sheaf Laplacian, Found. Data Sci., 2024 (2024), 033. https://doi.org/10.3934/fods.2024033 doi: 10.3934/fods.2024033
![]() |
[66] |
M. Wójcikowski, P. J. Ballester, P. Siedlecki, Performance of machine-learning scoring functions in structure-based virtual screening, Sci. Rep., 7 (2017), 46710. https://doi.org/10.1038/srep46710 doi: 10.1038/srep46710
![]() |
[67] |
K. L. Xia, G.-W. Wei, Persistent homology analysis of protein structure, flexibility, and folding, Int. J. Numer. Meth. Bio., 30 (2014), 814–844. https://doi.org/10.1002/cnm.2655 doi: 10.1002/cnm.2655
![]() |
[68] |
W. T. Yang, R. G. Parr, R. Pucci, Electron density, Kohn–Sham frontier orbitals, and Fukui functions, J. Chem. Phys., 81 (1984), 2862–2863. https://doi.org/10.1063/1.447964 doi: 10.1063/1.447964
![]() |
[69] |
R. D. Zhao, M. Desbrun, G.-W. Wei, Y. Y. Tong, 3D hodge decompositions of edge-and face-based vector fields, ACM T. Graphic., 38 (2019), 181. https://doi.org/10.1145/3355089.3356546 doi: 10.1145/3355089.3356546
![]() |
[70] |
A. Zomorodian, G. Carlsson, Computing persistent homology, Discrete Comput. Geom., 33 (2005), 249–274. https://doi.org/10.1007/s00454-004-1146-y doi: 10.1007/s00454-004-1146-y
![]() |
1. | Xiaoqi Wei, Guo-Wei Wei, Persistent Topological Laplacians—A Survey, 2025, 13, 2227-7390, 208, 10.3390/math13020208 | |
2. | Hongsong Feng, Li Shen, Jian Liu, Guo-Wei Wei, Mayer-Homology Learning Prediction of Protein-Ligand Binding Affinities, 2025, 24, 2737-4165, 253, 10.1142/S2737416524500613 | |
3. | Yaxing Wang, Xiang Liu, Yipeng Zhang, Xiangjun Wang, Kelin Xia, Join Persistent Homology (JPH)-Based Machine Learning for Metalloprotein–Ligand Binding Affinity Prediction, 2025, 1549-9596, 10.1021/acs.jcim.4c02309 |
Method | PCC | RMSE (kcal/mol) | |
PDBbind-v2007 | PHL | 0.794 | 2.066 |
TF | 0.795 | 2.006 | |
Consensus | 0.826 | 1.954 | |
PDBbind-v2016 | PHL | 0.808 | 1.863 |
TF | 0.836 | 1.716 | |
Consensus | 0.849 | 1.728 |