The PageRank model is widely utilized for analyzing a variety of scientific issues beyond its original application in modeling web search engines. In recent years, considerable research effort has focused on developing high-performance iterative methods to solve this model, particularly when the dimension is exceedingly large. However, due to the ever-increasing extent and size of data networks in various applications, the computational requirements of the PageRank model continue to grow. This has led to the development of new techniques that aim to reduce the computational complexity required for the solution. In this paper, we present a recursive 5-type lumping algorithm combined with a two-stage elimination strategy that leverage characteristics about the nonzero structure of the underlying network and the nonzero values of the PageRank coefficient matrix. This method reduces the initial PageRank problem to the solution of a remarkably smaller and sparser linear system. As a result, it leads to significant cost reductions for computing PageRank solutions, particularly in scenarios involving large and/or multiple damping factors. Numerical experiments conducted on over 50 real-world networks demonstrate that the proposed methods can effectively exploit characteristics of PageRank problems for efficient computations.
Citation: Zhao-Li Shen, Yu-Tong Liu, Bruno Carpentieri, Chun Wen, Jian-Jun Wang. Recursive reordering and elimination method for efficient computation of PageRank problems[J]. AIMS Mathematics, 2023, 8(10): 25104-25130. doi: 10.3934/math.20231282
The PageRank model is widely utilized for analyzing a variety of scientific issues beyond its original application in modeling web search engines. In recent years, considerable research effort has focused on developing high-performance iterative methods to solve this model, particularly when the dimension is exceedingly large. However, due to the ever-increasing extent and size of data networks in various applications, the computational requirements of the PageRank model continue to grow. This has led to the development of new techniques that aim to reduce the computational complexity required for the solution. In this paper, we present a recursive 5-type lumping algorithm combined with a two-stage elimination strategy that leverage characteristics about the nonzero structure of the underlying network and the nonzero values of the PageRank coefficient matrix. This method reduces the initial PageRank problem to the solution of a remarkably smaller and sparser linear system. As a result, it leads to significant cost reductions for computing PageRank solutions, particularly in scenarios involving large and/or multiple damping factors. Numerical experiments conducted on over 50 real-world networks demonstrate that the proposed methods can effectively exploit characteristics of PageRank problems for efficient computations.
[1] | S. Brin, L. Page, The anatomy of a large-scale hypertextual web search engine, Comput. Netw. ISDN Syst., 30 (1998), 107–117. https://doi.org/10.1016/S0169-7552(98)00110-X doi: 10.1016/S0169-7552(98)00110-X |
[2] | T. Zhou, E. Martinez-Baez, G. Schenter, A. E. Clark, PageRank as a collective variable to study complex chemical transformations and their energy landscapes, J. Chem. Phys., 150 (2019), 134102. https://doi.org/10.1063/1.5082648 doi: 10.1063/1.5082648 |
[3] | B. Liu, S. Jiang, Q. Zou, Hits-pr-hhblits: Protein remote homology detection by combining pagerank and hyperlink-induced topic search, Brief. Bioinformatics, 21 (2020), 298–308. https://doi.org/10.1093/bib/bby104 doi: 10.1093/bib/bby104 |
[4] | M. Rafiei, A. A. Kardan, A novel method for expert finding in online communities based on concept map and pagerank, Hum. Cent. Comput. Inf. Sci., 5 (2015), 10. https://doi.org/10.1186/s13673-015-0030-5 doi: 10.1186/s13673-015-0030-5 |
[5] | F. A. Massucci, D. Docampo, Measuring the academic reputation through citation networks via pagerank, J. Informetr., 13 (2019), 185–201. https://doi.org/10.1016/j.joi.2018.12.001 doi: 10.1016/j.joi.2018.12.001 |
[6] | M. Zhang, X. Li, L. Zhang, S. Khurshid, Boosting spectrum-based fault localization using Pagerank, In: Proceedings of the 26th ACM SIGSOFT international symposium on software testing and analysis, 2017,261–272. https://doi.org/10.1145/3092703.3092731 |
[7] | A. Bojchevski, J. Gasteiger, B. Perozzi, A. Kapoor, M. Blais, B. Rózemberczki, et al., Scaling graph neural networks with approximate pagerank, In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, 2020, 2464–2473. https://doi.org/10.1145/3394486.3403296 |
[8] | E. Chien, J. Peng, P. Li, O. Milenkovic, Adaptive universal generalized pagerank graph neural network, arXiv preprint, 2020. https://doi.org/10.48550/arXiv.2006.07988 |
[9] | A. Roth, T. Liebig, Transforming pagerank into an infinite-depth graph neural network, In: Joint European conference on machine learning and knowledge discovery in databases, 2022,469–484. https://doi.org/10.1007/978-3-031-26390-3_27 |
[10] | D. F. Gleich, PageRank beyond the web, SIAM Rev., 57 (2015), 321–363. https://doi.org/10.1137/140976649 |
[11] | R. A. Horn, S. Serra-Capizzano, A general setting for the parametric Google matrix, Internet Math., 3 (2008), 385–411. https://doi.org/10.1080/15427951.2006.10129131 doi: 10.1080/15427951.2006.10129131 |
[12] | S. Serra-Capizzano, Jordan canonical form of the Google matrix: A potential contribution to the PageRank computation, SIAM J. Matrix Anal. Appl., 27 (2005), 305–312. https://doi.org/10.1137/S0895479804441407 doi: 10.1137/S0895479804441407 |
[13] | A. Langville, C. Meyer, Google's PageRank and beyond: The science of search engine rankings, Princeton: Princeton University Press, 2006. https://doi.org/10.1515/9781400830329 |
[14] | P. G. Constantine, D. F. Gleich, Random alpha PageRank, Internet Math., 6 (2009), 189–236. https://doi.org/10.1080/15427951.2009.10129185 |
[15] | S. D. Kamvar, T. H. Haveliwala, C. D. Manning, G. H. Golub, Extrapolation methods for accelerating PageRank computation, In: Proceedings of the 12th international conference on World Wide Web, (2003), 261–270. https://doi.org/10.1145/775152.775190 |
[16] | X. Tan, A new extrapolation method for PageRank computations, J. Comput. Appl. Math., 313 (2017), 383–392. https://doi.org/10.1016/j.cam.2016.08.034 doi: 10.1016/j.cam.2016.08.034 |
[17] | C. Brezinski, M. Redivo-Zaglia, S. Serra-Capizzano, Extrapolation methods for PageRank computations, CR Math., 340 (2005), 393–397. https://doi.org/10.1016/j.crma.2005.01.015 doi: 10.1016/j.crma.2005.01.015 |
[18] | A. Cicone, S. Serra-Capizzano, Google PageRanking problem: The model and the analysis, J. Comput. Appl. Math., 234 (2010), 3140–3169. https://doi.org/10.1016/j.cam.2010.02.005 doi: 10.1016/j.cam.2010.02.005 |
[19] | S. D. Kamvar, T. H. Haveliwala, G. H. Golub, Adaptive methods for the computation of the PageRank, Linear Algebra Appl., 386 (2004), 51–65. https://doi.org/10.1016/j.laa.2003.12.008 doi: 10.1016/j.laa.2003.12.008 |
[20] | H. D. Sterck, T. A. Manteuffel, S. F. McCormick, Q. Nguyen, J. Ruge, Multilevel adaptive aggregation for Markov chains, with application to web ranking, SIAM J. Sci. Comput., 30 (2008), 2235–2262. https://doi.org/10.1137/070685142 doi: 10.1137/070685142 |
[21] | Z. L. Shen, T. Z. Huang, B. Carpentieri, C. Wen, X. M. Gu, Block-accelerated aggregation multigrid for Markov chains with application to PageRank problems, Commun. Nonlinear Sci. Numer. Simul., 59 (2018), 472–487. https://doi.org/10.1016/j.cnsns.2017.11.031 doi: 10.1016/j.cnsns.2017.11.031 |
[22] | D. F. Gleich, A. P. Gray, C. Greif, T. Lau, An inner-outer iteration for computing PageRank, SIAM J. Sci. Comput., 32 (2010), 349–371. https://doi.org/10.1137/080727397 doi: 10.1137/080727397 |
[23] | C. Q. Gu, F. Xie, K. Zhang, A two-step matrix splitting iteration for computing PageRank, J. Comput. Appl. Math., 278 (2015), 19–28. https://doi.org/10.1016/j.cam.2014.09.022 doi: 10.1016/j.cam.2014.09.022 |
[24] | C. Wen, T. Z. Huang, Z. L. Shen, A note on the two-step matrix splitting iteration for computing PageRank, J. Comput. Appl. Math., 315 (2017), 87–97. https://doi.org/10.1016/j.cam.2016.10.020 doi: 10.1016/j.cam.2016.10.020 |
[25] | Z. L. Tian, Y. Liu, Y. Zhang, Z. Y. Liu, M. Y. Tian, The general inner-outer iteration method based on regular splittings for the PageRank problem, Appl. Math. Comput., 356 (2019), 479–501. https://doi.org/10.1016/j.amc.2019.02.066 doi: 10.1016/j.amc.2019.02.066 |
[26] | M. Y. Tian, Y. Zhang, Y. D. Wang, A general multi-splitting iteration method for computing PageRank, Comput. Appl. Math., 38 (2019), 1–29. https://doi.org/10.1007/s40314-019-0830-8 doi: 10.1007/s40314-019-0830-8 |
[27] | G. H. Golub, C. Greif, An Arnoldi-type algorithm for computing pagerank, BIT Numer. Math., 46 (2006), 759–771. https://doi.org/10.1007/s10543-006-0091-y doi: 10.1007/s10543-006-0091-y |
[28] | J. F. Yin, G. J. Yin, M. Ng, On adaptively accelerated Arnoldi method for computing PageRank, Numer. Linear Algebra Appl., 19 (2012), 73–85. https://doi.org/10.1002/nla.789 doi: 10.1002/nla.789 |
[29] | Z. L. Shen, H. Yang, B. Carpentieri, X. M. Gu, C. Wen, A preconditioned variant of the refined arnoldi method for computing PageRank eigenvectors, Symmetry, 13 (2021), 1327. https://doi.org/10.3390/sym13081327 doi: 10.3390/sym13081327 |
[30] | H. F. Zhang, T. Z. Huang, C. Wen, Z. L. Shen, FOM accelerated by an extrapolation method for solving PageRank problems, J. Comput. Appl. Math., 296 (2016), 397–409. https://doi.org/10.1016/j.cam.2015.09.027 doi: 10.1016/j.cam.2015.09.027 |
[31] | G. Wu, Y. Wei, A power-Arnoldi algorithm for computing pagerank, Numer. Linear Algebra Appl., 14 (2007), 521–546. https://doi.org/10.1002/nla.531 doi: 10.1002/nla.531 |
[32] | C. Q. Gu, X. L. Jiang, C. C. Shao, Z. B. Chen, A GMRES-Power algorithm for computing PageRank problems, J. Comput. Appl. Math., 343 (2018), 113–123. https://doi.org/10.1016/j.cam.2018.03.017 doi: 10.1016/j.cam.2018.03.017 |
[33] | Q. Y. Hu, C. Wen, T. Z. Huang, Z. L. Shen, X. M. Gu, A variant of the Power-Arnoldi algorithm for computing PageRank, J. Comput. Appl. Math., 381 (2021), 113034. https://doi.org/10.1016/j.cam.2020.113034 doi: 10.1016/j.cam.2020.113034 |
[34] | C. Q. Gu, W. W. Wang, An Arnoldi-Inout algorithm for computing PageRank problems, J. Comput. Appl. Math., 309 (2017), 219–229. https://doi.org/10.1016/j.cam.2016.05.026 doi: 10.1016/j.cam.2016.05.026 |
[35] | D. F. Gleich, L. Zhukov, P. Berkhin, Fast parallel pagerank: A linear system approach, 2005. |
[36] | Y. Lin, X. Shi, Y. Wei, On computing PageRank via lumping the Google matrix, J. Comput. Appl. Math., 224 (2009), 702–708. https://doi.org/10.1016/j.cam.2008.06.003 doi: 10.1016/j.cam.2008.06.003 |
[37] | Q. Yu, Z. Miao, G. Wu, Y. Wei, Lumping algorithms for computing Google's PageRank and its derivative, with attention to unreferenced nodes, Inf. Retr., 15 (2012), 503–526. https://doi.org/10.1007/s10791-012-9183-2 doi: 10.1007/s10791-012-9183-2 |
[38] | A. N. Langville, C. D. Meyer, A reordering for the PageRank problem, SIAM J. Sci. Comput., 27 (2006), 2112–2120. https://doi.org/10.1137/040607551 doi: 10.1137/040607551 |
[39] | Z. L. Shen, T. Z. Huang, B. Carpentieri, X. M. Gu, C. Wen, An efficient elimination strategy for solving PageRank problems, Appl. Math. Comput., 298 (2017), 111–122. https://doi.org/10.1016/j.amc.2016.10.031 doi: 10.1016/j.amc.2016.10.031 |
[40] | Z. L. Shen, T. Z. Huang, B. Carpentieri, C. Wen, X. M. Gu, X. Y. Tan, Off-diagonal low-rank preconditioner for difficult PageRank problems, J. Comput. Appl. Math., 346 (2019), 456–470. https://doi.org/10.1016/j.cam.2018.07.015 doi: 10.1016/j.cam.2018.07.015 |
[41] | Z. L. Shen, B. Carpentieri, Multi-Step Low-Rank Decomposition of Large PageRank Matrices, In: The 7th international conference on fuzzy systems and data mining, 340 (2021), 397–404. https://doi.org/10.3233/FAIA210212 |
[42] | D. J. Higham, N. J. Higham, MATLAB guide, SIAM press, 2016. |
[43] | C. P. Lee, G. H. Golub, S. A. Zenios, Partial state space aggregation based on lumpability and its application to PageRank, Tech. Rep. Stanford Univ., 2003. |
[44] | S. D. Kamvar, T. H. Haveliwala, C. D. Manning, G. H. Goloub, Exploiting the block structure of the web for computing PageRank, Tech. Rep. Stanford Univ., 2003. |
[45] | A. Scime, Web mining: Applications and techniques, IGI Global Press, 2005. https://doi.org/10.4018/978-1-59140-414-9 |
[46] | Y. P. Hong, C. T. Pan, A lower bound for the smallest singular value, Linear Algebra Appl., 172 (1992), 27–32. https://doi.org/10.1016/0024-3795(92)90016-4 doi: 10.1016/0024-3795(92)90016-4 |
[47] | O. Axelsson, M. Neytcheva, A general approach to analyse preconditioners for two-by-two block matrices, Numer. Linear Algebra Appl., 20 (2013), 723–742. https://doi.org/10.1002/nla.830 doi: 10.1002/nla.830 |
[48] | T. A. Davis, Y. Hu, The University of Florida sparse matrix collection, ACM Trans. Math. Softw., 38 (2011), 1–25. |
[49] | P. Boldi, S. Vigna, The webgraph framework I: Compression techniques, In: Proceedings of the 13th international conference on World Wide Web, 2004,595–602. https://doi.org/10.1145/988672.988752 |
[50] | P. Boldi, M. Rosa, M. Santini, S. Vigna, Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks, In: Proceedings of the 20th international conference on World Wide Web, 2011,587–596. https://doi.org/10.1145/1963405.1963488 |
[51] | P. Boldi, B. Codenotti, M. Santini, S. Vigna, Ubicrawler: A scalable fully distributed Web crawler, Softw. Pract. Exp., 34 (2004), 711–726. https://doi.org/10.1002/spe.587 doi: 10.1002/spe.587 |
[52] | Y. Saad, M. H. Schultz, GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems, SIAM J. Sci. Comput., 7 (1986), 856–869. https://doi.org/10.1137/0907058 doi: 10.1137/0907058 |
[53] | M. Bollhöefer, Y. Saad, O. Schenk, ILUPACK-preconditioning software package, 2010. |