Recently, topological data analysis has become a trending topic in data science and engineering. However, the key technique of topological data analysis, i.e., persistent homology, is defined on point cloud data, which does not work directly for data on manifolds. Although earlier evolutionary de Rham-Hodge theory deals with data on manifolds, it is inconvenient for machine learning applications because of the numerical inconsistency caused by remeshing the involving manifolds in the Lagrangian representation. In this work, we introduced persistent de Rham-Hodge Laplacian, or persistent Hodge Laplacian (PHL), as an abbreviation for manifold topological learning. Our PHLs were constructed in the Eulerian representation via structure-persevering Cartesian grids, avoiding the numerical inconsistency over the multi-scale manifolds. To facilitate the manifold topological learning, we proposed a persistent Hodge Laplacian learning algorithm for data on manifolds or volumetric data. As a proof-of-principle application of the proposed manifold topological learning model, we considered the prediction of protein-ligand binding affinities with two benchmark datasets. Our numerical experiments highlighted the power and promise of the proposed method.
Citation: Zhe Su, Yiying Tong, Guo-Wei Wei. Persistent de Rham-Hodge Laplacians in Eulerian representation for manifold topological learning[J]. AIMS Mathematics, 2024, 9(10): 27438-27470. doi: 10.3934/math.20241333
Recently, topological data analysis has become a trending topic in data science and engineering. However, the key technique of topological data analysis, i.e., persistent homology, is defined on point cloud data, which does not work directly for data on manifolds. Although earlier evolutionary de Rham-Hodge theory deals with data on manifolds, it is inconvenient for machine learning applications because of the numerical inconsistency caused by remeshing the involving manifolds in the Lagrangian representation. In this work, we introduced persistent de Rham-Hodge Laplacian, or persistent Hodge Laplacian (PHL), as an abbreviation for manifold topological learning. Our PHLs were constructed in the Eulerian representation via structure-persevering Cartesian grids, avoiding the numerical inconsistency over the multi-scale manifolds. To facilitate the manifold topological learning, we proposed a persistent Hodge Laplacian learning algorithm for data on manifolds or volumetric data. As a proof-of-principle application of the proposed manifold topological learning model, we considered the prediction of protein-ligand binding affinities with two benchmark datasets. Our numerical experiments highlighted the power and promise of the proposed method.
[1] | H. Adams, T. Emerson, M. Kirby, R. Neville, C. Peterson, P. Shipman, et al., Persistence images: A stable vector representation of persistent homology, J. Mach. Learn. Res., 18 (2017), 1–35. |
[2] | H, Adams, A. Tausz, M. Vejdemo-Johansson, JavaPlex: A research software package for persistent (co) homology, Mathematical Software–ICMS 2014, Seoul, South Korea, 2014,129–136. https://doi.org/10.1007/978-3-662-44199-2_23 |
[3] | D. N. Arnold, R. S. Falk, R. Winther, Finite element exterior calculus, homological techniques, and applications, Acta Numer., 15 (2006), 1–155. https://doi.org/10.1017/S0962492906210018 doi: 10.1017/S0962492906210018 |
[4] | U. Bauer, Ripser: efficient computation of vietoris-rips persistence barcodes, J. Appl. Comput. Topology, 5 (2021), 391–423. https://doi.org/10.1007/s41468-021-00071-5 doi: 10.1007/s41468-021-00071-5 |
[5] | P. Bubenik, Statistical topological data analysis using persistence landscapes, J. Mach. Learn. Res., 16 (2015), 77–102. |
[6] | H. Cai, C. Shen, T. Y. Jian, X. J. Zhang, T. Chen, X. Q. Han, et al., Carsidock: A deep learning paradigm for accurate protein-ligand docking and screening based on large-scale pre-training, Chem. Sci., 15 (2024), 1449–1471. https://doi.org/10.1039/D3SC05552C doi: 10.1039/D3SC05552C |
[7] | Z. X. Cang, L. Mu, G.-W. Wei, Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening, PLoS Comput. Biol., 14 (2018), e1005929. https://doi.org/10.1371/journal.pcbi.1005929 doi: 10.1371/journal.pcbi.1005929 |
[8] | Z. X. Cang, L. Mu, K. D. Wu, K. Opron, K. Xia, G.-W. Wei, A topological approach for protein classification, Computational and Mathematical Biophysics, 3 (2015), 140–162. https://doi.org/10.1515/mlbmb-2015-0009 doi: 10.1515/mlbmb-2015-0009 |
[9] | Z. X. Cang, G.-W. Wei, Topologynet: Topology based deep convolutional and multi-task neural networks for biomolecular property predictions, PLoS Comput. Biol., 13 (2017), e1005690. https://doi.org/10.1371/journal.pcbi.1005690 doi: 10.1371/journal.pcbi.1005690 |
[10] | Z. X. Cang, G.-W. Wei, Integration of element specific persistent homology and machine learning for protein-ligand binding affinity prediction, Int. J. Numer. Meth. Bio., 34 (2018), e2914. https://doi.org/10.1002/cnm.2914 doi: 10.1002/cnm.2914 |
[11] | G. Carlsson, Topology and data, B. Am. Math. Soc., 46 (2009), 255–308. |
[12] | D. Chen, J. Liu, G.-W. Wei, multi-scale topology-enabled structure-to-sequence transformer for protein-ligand interaction predictions, Nat. Mach. Intell., 6 (2024), 799–810. https://doi.org/10.1038/s42256-024-00855-1 doi: 10.1038/s42256-024-00855-1 |
[13] | D. Chen, J. X. Zheng, G.-W. Wei, F. Pan, Extracting predictive representations from hundreds of millions of molecules, J. Phys. Chem. Lett., 12 (2021), 10793–10801. https://doi.org/10.1021/acs.jpclett.1c03058 doi: 10.1021/acs.jpclett.1c03058 |
[14] | H. Chen, Y. Zhang, W. H. Zhang, P. X. Liao, K. Li, J. L. Zhou, et al., Low-dose CT via convolutional neural network, Biomed. Opt. Express, 8 (2017), 679–694. https://doi.org/10.1364/BOE.8.000679 doi: 10.1364/BOE.8.000679 |
[15] | J. H. Chen, Y. C. Qiu, R. Wang, G.-W. Wei, Persistent laplacian projected omicron ba. 4 and ba. 5 to become new dominating variants, Comput. Biol. Med., 151 (2022), 106262. https://doi.org/10.1016/j.compbiomed.2022.106262 doi: 10.1016/j.compbiomed.2022.106262 |
[16] | J. H. Chen, R. Wang, M. L. Wang, G.-W. Wei, Mutations strengthened SARS-CoV-2 infectivity, J. Mol. Biol., 432 (2020), 5212–5226. https://doi.org/10.1016/j.jmb.2020.07.009 doi: 10.1016/j.jmb.2020.07.009 |
[17] | J. H. Chen, G.-W. Wei, Omicron BA. 2 (B. 1.1. 529.2): high potential for becoming the next dominant variant, J. Phys. Chem. Lett., 13 (2022), 3840–3849. https://doi.org/10.1021/acs.jpclett.2c00469 doi: 10.1021/acs.jpclett.2c00469 |
[18] | J. H. Chen, R. D. Zhao, Y. Y. Tong, G.-W. Wei, Evolutionary de rham-hodge method, Discrete Cont. Dyn-B, 26 (2021), 3785–3821. https://doi.org/10.3934/dcdsb.2020257 doi: 10.3934/dcdsb.2020257 |
[19] | M. Desbrun, E. Kanso, Y. Y. Tong, Discrete differential forms for computational modeling, In: ACM SIGGRAPH 2006 Courses, New York: Association for Computing Machinery, 2006, 39–54. https://doi.org/10.1145/1185657.1185665 |
[20] | T. K. Dey, F. T. Fan, Y. S. Wang, Computing topological persistence for simplicial maps, In: Proceedings of the thirtieth annual symposium on Computational geometry, New York: Association for Computing Machinery, 2014, 345–354. https://doi.org/10.1145/2582112.2582165 |
[21] | J. Dodziuk, Finite-difference approach to the hodge theory of harmonic forms, Am. J. Math., 98 (1976), 79–104. https://doi.org/10.2307/2373615 doi: 10.2307/2373615 |
[22] | R. Dong, A faster algorithm of up persistent laplacian over non-branching simplicial complexes, 2024, arXiv: 2408.16741. https://doi.org/10.48550/arXiv.2408.16741 |
[23] | H. Edelsbrunner, J. Harer, Persistent homology-a survey, In: Surveys on discrete and computational geometry: twenty years later, Singapore: Contemporary Mathematics, 2008, 257–282. https://doi.org/10.1090/conm/453/08802 |
[24] | P. G. Francoeur, T. Masuda, J. Sunseri, A. Jia, R. B. Iovanisci, I. Snyder, et al., Three-dimensional convolutional neural networks and a cross-docked data set for structure-based drug design, J. Chem. Inf. Model., 60 (2020), 4200–4215. https://doi.org/10.1021/acs.jcim.0c00411 doi: 10.1021/acs.jcim.0c00411 |
[25] | K. O. Friedrichs, Differential forms on riemannian manifolds, Commun. Pur. Appl. Math., 8 (1955), 551–590. https://doi.org/10.1002/cpa.3160080408 doi: 10.1002/cpa.3160080408 |
[26] | A. Gaulton, A. Hersey, M. Nowotka, A.P. Bento, J. Chambers, D. Mendez, P. Mutowo, F. Atkinson, L.J. Bellis, E. Cibrián-Uhalte, M. Davies, The ChEMBL database in 2017, Nucleic acids research, 45 (2017), D945–D954. https://doi.org/10.1093/nar/gkw1074 doi: 10.1093/nar/gkw1074 |
[27] | R. Ghrist, Barcodes: The persistent topology of data, B. Am. Math. Soc., 45 (2008), 61–75. https://doi.org/10.1090/S0273-0979-07-01191-3 doi: 10.1090/S0273-0979-07-01191-3 |
[28] | A. B. Gülen, F. Mémoli, Z. C. Wan, Y. S. Wang, A generalization of the persistent laplacian to simplicial maps, The 39th International Symposium on Computational Geometry (SoCG 2023), Schloss Dagstuhl–Leibniz-Zentrum für Informatik, 2023, 37: 1–37: 17. https://doi.org/10.4230/LIPIcs.SoCG.2023.37 |
[29] | J. Irwin and B. Shoichet, ZINC - a free database of commercially available compounds for virtual screening, Journal of chemical information and modeling, 45 (2005), 177–182. https://doi.org/10.1021/ci049714+ doi: 10.1021/ci049714+ |
[30] | M. Khovanov, A categorification of the jones polynomial, Duke Math. J., 101 (2000), 359–426. https://doi.org/10.1215/S0012-7094-00-10131-7 doi: 10.1215/S0012-7094-00-10131-7 |
[31] | S. Kim, P.A. Thiessen, E.E. Bolton, J. Chen, G. Fu, A. Gindulyte, L. Han, J. He, S. He, B.A. Shoemaker, J. Wang, PubChem substance and compound databases, Nucleic acids research, 44 (2016), D1202–D1213. https://doi.org/10.1093/nar/gkv951 doi: 10.1093/nar/gkv951 |
[32] | Z. M. Lin, H. Akin, R. Rao, B. Hie, Z. K. Zhu, W. T. Lu, et al., Language models of protein sequences at the scale of evolution enable accurate structure prediction, BioRxiv, 2022 (2022), 500902. https://doi.org/10.1101/2022.07.20.500902 doi: 10.1101/2022.07.20.500902 |
[33] | J. Liu, J. Y. Li, J. Wu, The algebraic stability for persistent laplacians, 2023, arXiv: 2302.03902. https://doi.org/10.48550/arXiv.2302.03902 |
[34] | J.-B. Liu, X. Wang, J. D. Cao, The coherence and properties analysis of balanced $2^{p}$-ary tree networks, IEEE T. Netw. Sci. Eng., 11 (2024), 4719–4728. https://doi.org/10.1109/TNSE.2024.3395710 doi: 10.1109/TNSE.2024.3395710 |
[35] | J.-B. Liu, X. Zhang, J. D. Cao, L. P. Chen, Mean first-passage time and robustness of complex cellular mobile communication network, IEEE T. Netw. Sci. Eng., 11 (2024), 3066–3076. https://doi.org/10.1109/TNSE.2024.3358369 doi: 10.1109/TNSE.2024.3358369 |
[36] | R. Liu, X. Liu, J. Wu, Persistent path-spectral (PPS) based machine learning for protein-ligand binding affinity prediction, J. Chem. Inf. Model., 63 (2023), 1066–1075. https://doi.org/10.1021/acs.jcim.2c01251 doi: 10.1021/acs.jcim.2c01251 |
[37] | X. Liu, H. T. Feng, J. Wu, K. L. Xia, Persistent spectral hypergraph based machine learning (PSH-ML) for protein-ligand binding affinity prediction, Brief. Bioinform., 22 (2021), bbab127. https://doi.org/10.1093/bib/bbab127 doi: 10.1093/bib/bbab127 |
[38] | Z. H. Liu, M. Y. Su, L. Han, J. Liu, Q. F. Yang, Y. Li, et al., Forging the basis for developing protein-ligand interaction scoring functions, Acc. Chem. Res., 50 (2017), 302–309. https://doi.org/10.1021/acs.accounts.6b00491 doi: 10.1021/acs.accounts.6b00491 |
[39] | R. MacPherson, B. Schweinhart, Measuring shape with topology, J. Math. Phys., 53 (2012), 073516. https://doi.org/10.1063/1.4737391 doi: 10.1063/1.4737391 |
[40] | F. Mémoli, Z. C. Wan, Y. S. Wang, Persistent laplacians: properties, algorithms and implications, SIAM J. Math. Data Sci., 4 (2022), 858–884. https://doi.org/10.1137/21M1435471 doi: 10.1137/21M1435471 |
[41] | Z. Y. Meng, K. L. Xia, Persistent spectral-based machine learning (perspect ml) for protein-ligand binding affinity prediction, Sci. Adv., 7 (2021), eabc5329. https://doi.org/10.1126/sciadv.abc5329 doi: 10.1126/sciadv.abc5329 |
[42] | K. Mischaikow, V. Nanda, Morse theory for filtrations and efficient computation of persistent homology, Discrete Comput. Geom., 50 (2013), 330–353. https://doi.org/10.1007/s00454-013-9529-6 doi: 10.1007/s00454-013-9529-6 |
[43] | C. B. Morrey, A variational method in the theory of harmonic integrals, ii, Am. J. Math., 78 (1956), 137–170. https://doi.org/10.2307/2372488 doi: 10.2307/2372488 |
[44] | D. D. Nguyen, Z. X. Cang, G.-W. Wei, A review of mathematical representations of biomolecular data, Phys. Chem. Chem. Phys., 22 (2020), 4343–4367. https://doi.org/10.1039/C9CP06554G doi: 10.1039/C9CP06554G |
[45] | D. D. Nguyen, Z. X. Cang, K. D. Wu, M. L. Wang, Y. Cao, G.-W. Wei, Mathematical deep learning for pose and binding affinity prediction and ranking in D3R grand challenges, J. Comput. Aided Mol. Des., 33 (2019), 71–82. https://doi.org/10.1007/s10822-018-0146-6 doi: 10.1007/s10822-018-0146-6 |
[46] | D. D. Nguyen, K. F. Gao, M. L. Wang, G.-W. Wei, MathDL: mathematical deep learning for D3R grand challenge 4, J. Comput. Aided Mol. Des., 34 (2020), 131–147. https://doi.org/10.1007/s10822-019-00237-5 doi: 10.1007/s10822-019-00237-5 |
[47] | D. D. Nguyen, G.-W. Wei, DG-GL: Differential geometry-based geometric learning of molecular datasets, Int. J. Numer. Meth. Bio., 35 (2019), e3179. https://doi.org/10.1002/cnm.3179 doi: 10.1002/cnm.3179 |
[48] | E. Panagiotou, K. C. Millett, P. J. Atzberger, Topological methods for polymeric materials: characterizing the relationship between polymer entanglement and viscoelasticity, Polymers, 11 (2019), 437. https://doi.org/10.3390/polym11030437 doi: 10.3390/polym11030437 |
[49] | T. Papamarkou, T. Birdal, M. M. Bronstein. G. E. Carlsson, J. Curry, Y. Gao, et al., Position: Topological Deep Learning is the New Frontier for Relational Learning, The 41st International Conference on Machine Learning, Vienna, Austria, 2024, 39529–39555. |
[50] | C. S. Pun, K. Xia, S. X. Lee, Persistent-homology-based machine learning and its applications–a survey, 2018 arXiv: 1811.00252. https://doi.org/10.48550/arXiv.1811.00252 |
[51] | M. M. Rana, D. D. Nguyen, Geometric graph learning with extended atom-types features for protein-ligand binding affinity prediction, Comput. Biol. Med., 164 (2023), 107250. https://doi.org/10.1016/j.compbiomed.2023.107250 doi: 10.1016/j.compbiomed.2023.107250 |
[52] | E. Ribando-Gros, R. Wang, J. H. Chen, Y. Y. Tong, G.-W. Wei, Combinatorial and hodge laplacians: Similarity and difference, SIAM Rev., 66 (2024), 575–601. https://doi.org/10.1137/22M1482299 doi: 10.1137/22M1482299 |
[53] | G. Schwarz, Hodge decomposition–A method for solving boundary value problems, Berlin: Springer, 1995. https://doi.org/10.1007/BFb0095978 |
[54] | L. Shen, H. S. Feng, F. L. Li, F. C. Lei, J. Wu, G.-W. Wei, Knot data analysis using multi-scale gauss link integral, P. Nati. A. Sci., In press, 2024. |
[55] | L. Shen, J. Liu, G.-W. Wei, Evolutionary khovanov homology, AIMS Mathematics, 9 (2024), 26139–26165. https://doi.org/10.3934/math.20241277 doi: 10.3934/math.20241277 |
[56] | C. Shonkwiler, Poincaré duality angles on Riemannian manifolds with boundary dissertation, University of Pennsylvania, PhD Thesis, University of Pennsylvania, 2009. |
[57] | M. Y. Su, Q. F. Yang, Y. Du, G. Q. Feng, Z. H. Liu, Y. Li, et al., Comparative assessment of scoring functions: the CASF-2016 update, J. Chem. Inf. Model., 59 (2019), 895–913. https://doi.org/10.1021/acs.jcim.8b00545 doi: 10.1021/acs.jcim.8b00545 |
[58] | Z. Su, Y. Y. Tong, G.-W. Wei, Hodge decomposition of single-cell RNA velocity, J. Chem. Inf. Model., 64 (2024), 3558–3568. https://doi.org/10.1021/acs.jcim.4c00132 doi: 10.1021/acs.jcim.4c00132 |
[59] | J. Townsend, C. P. Micucci, J. H. Hymel, V. Maroulas, K. D. Vogiatzis, Representation of molecular structures with persistent homology for machine learning applications in chemistry, Nat. Commun., 11 (2020), 3230. https://doi.org/10.1038/s41467-020-17035-5 doi: 10.1038/s41467-020-17035-5 |
[60] | R. Wang, J. H. Chen, G.-W. Wei, Mechanisms of SARS-CoV-2 evolution revealing vaccine-resistant mutations in Europe and America, J. Phys. Chem. Lett., 12 (2021), 11850–11857. https://doi.org/10.1021/acs.jpclett.1c03380 doi: 10.1021/acs.jpclett.1c03380 |
[61] | R. Wang, D. D. Nguyen, G.-W. Wei, Persistent spectral graph, Int. J. Numer. Meth. Bio., 36 (2020), e3376. https://doi.org/10.1002/cnm.3376 doi: 10.1002/cnm.3376 |
[62] | R. Wang, R. D. Zhao, E. Ribando-Gros, J. H. Chen, Y. Y. Tong, G.-W. Wei, Hermes: Persistent spectral graph software, Found. Data Sci., 3 (2021), 67–97. https://doi.org/10.3934/fods.2021006 doi: 10.3934/fods.2021006 |
[63] | L. Wasserman, Topological data analysis, Annu. Rev. Stat. Appl., 5 (2018), 501–532. https://doi.org/10.1146/annurev-statistics-031017-100045 doi: 10.1146/annurev-statistics-031017-100045 |
[64] | X. Q. Wei, G.-W. Wei, Persistent topological Laplacians–a survey, 2023, arXiv: 2312.07563. https://doi.org/10.48550/arXiv.2312.07563 |
[65] | X. Q. Wei, G.-W. Wei, Persistent sheaf Laplacian, Found. Data Sci., 2024 (2024), 033. https://doi.org/10.3934/fods.2024033 doi: 10.3934/fods.2024033 |
[66] | M. Wójcikowski, P. J. Ballester, P. Siedlecki, Performance of machine-learning scoring functions in structure-based virtual screening, Sci. Rep., 7 (2017), 46710. https://doi.org/10.1038/srep46710 doi: 10.1038/srep46710 |
[67] | K. L. Xia, G.-W. Wei, Persistent homology analysis of protein structure, flexibility, and folding, Int. J. Numer. Meth. Bio., 30 (2014), 814–844. https://doi.org/10.1002/cnm.2655 doi: 10.1002/cnm.2655 |
[68] | W. T. Yang, R. G. Parr, R. Pucci, Electron density, Kohn–Sham frontier orbitals, and Fukui functions, J. Chem. Phys., 81 (1984), 2862–2863. https://doi.org/10.1063/1.447964 doi: 10.1063/1.447964 |
[69] | R. D. Zhao, M. Desbrun, G.-W. Wei, Y. Y. Tong, 3D hodge decompositions of edge-and face-based vector fields, ACM T. Graphic., 38 (2019), 181. https://doi.org/10.1145/3355089.3356546 doi: 10.1145/3355089.3356546 |
[70] | A. Zomorodian, G. Carlsson, Computing persistent homology, Discrete Comput. Geom., 33 (2005), 249–274. https://doi.org/10.1007/s00454-004-1146-y doi: 10.1007/s00454-004-1146-y |