Research article Special Issues

Optimized RNA structure alignment algorithm based on longest arc-preserving common subsequence

  • Received: 29 January 2024 Revised: 10 March 2024 Accepted: 13 March 2024 Published: 21 March 2024
  • MSC : 68W10, 90C27, 92B05, 92C15

  • Ribonucleic acid (RNA) structure alignment is an important problem in computational biology to identify structural similarity of RNAs. Obtaining an efficient method for this problem is challenging due to the high computational time for the optimal solution and the low accuracy of a heuristic solution. In this paper, an efficient algorithm is proposed based on a mathematical model called longest arc-preserving common subsequence. The proposed algorithm uses a heuristic technique and high-performance computing to optimize the solution of RNA structure alignment, both in terms of the running time and the accuracy of the output. Extensive experimental studies on a multicore system are conducted to show the effectiveness of the proposed algorithm on two types of data. The first is simulated data that consists of 450 comparisons of RNA structures, while the second is real biological data that consists of 357 comparisons of RNA structures. The results show that the proposed algorithm outperforms the best-known heuristic algorithm in terms of execution time, with a percentage improvement of 71% and increasing the length of the output, i.e., accuracy, by approximately 45% in all studied cases. Finally, future approaches are discussed.

    Citation: Hazem M. Bahig, Mohamed A.G. Hazber, Tarek G. Kenawy. Optimized RNA structure alignment algorithm based on longest arc-preserving common subsequence[J]. AIMS Mathematics, 2024, 9(5): 11212-11227. doi: 10.3934/math.2024550

    Related Papers:

  • Ribonucleic acid (RNA) structure alignment is an important problem in computational biology to identify structural similarity of RNAs. Obtaining an efficient method for this problem is challenging due to the high computational time for the optimal solution and the low accuracy of a heuristic solution. In this paper, an efficient algorithm is proposed based on a mathematical model called longest arc-preserving common subsequence. The proposed algorithm uses a heuristic technique and high-performance computing to optimize the solution of RNA structure alignment, both in terms of the running time and the accuracy of the output. Extensive experimental studies on a multicore system are conducted to show the effectiveness of the proposed algorithm on two types of data. The first is simulated data that consists of 450 comparisons of RNA structures, while the second is real biological data that consists of 357 comparisons of RNA structures. The results show that the proposed algorithm outperforms the best-known heuristic algorithm in terms of execution time, with a percentage improvement of 71% and increasing the length of the output, i.e., accuracy, by approximately 45% in all studied cases. Finally, future approaches are discussed.



    加载中


    [1] D. Jereva, P. Alov, I. Tsakovska, M. Angelova, V. Atanassova, P. Vassilev, et al., Application of intercriteria analysis to assess the performance of scoring functions in molecular docking software packages, Mathematics, 10 (2022), 2549. https://doi.org/10.3390/math10152549 doi: 10.3390/math10152549
    [2] M. M. Abbas, M. Abouelhoda, H. M. Bahig, A hybrid method for the exact planted (l, d) motif finding problem and its parallelization, BMC Bioinformatics, 13 (2012), S10. https://doi.org/10.1186/1471-2105-13-S17-S10 doi: 10.1186/1471-2105-13-S17-S10
    [3] M. M. Abbass, H. M. Bahig, An efficient algorithm to identify DNA motifs, Math. Comput. Sci., 7 (2013), 387–399. https://doi.org/10.1007/s11786-013-0165-6 doi: 10.1007/s11786-013-0165-6
    [4] T. G. Kenawy, M. H. Abdel-Rahman, H. M. Bahig, A fast longest crossing-plain preserving common subsequence algorithm, Int. J. Inf. Technol., 14 (2022), 3019–3029. https://doi.org/10.1007/s41870-022-01038-0 doi: 10.1007/s41870-022-01038-0
    [5] M. M. Abbas, H. M. Bahig, M. Abouelhoda, M. M. Mohie-Eldin, Parallelizing exact motif finding algorithms on multi-core, J. Supercomput., 69 (2014), 814–826. https://doi.org/10.1007/s11227-014-1180-3 doi: 10.1007/s11227-014-1180-3
    [6] C. Blum, M. J. Blesa, Hybrid techniques based on solving reduced problem instances for a longest common subsequence problem, Appl. Soft Comput., 62 (2018), 15–28. https://doi.org/10.1016/j.asoc.2017.10.005 doi: 10.1016/j.asoc.2017.10.005
    [7] M. S. Islam, M. R. Islam, A hybrid framework based on genetic algorithm and simulated annealing for RNA structure prediction with pseudoknots, J. King Saud Univ. Comput. Inform. Sci., 34 (2022), 912–922. https://doi.org/10.1016/j.jksuci.2020.03.005 doi: 10.1016/j.jksuci.2020.03.005
    [8] T. J. X. Li, C. M. Reidys, On the loop homology of a certain complex of RNA structures, Mathematics, 9 (2021), 1749. https://doi.org/10.3390/math9151749 doi: 10.3390/math9151749
    [9] J. Fallmann, S. Will, J. Engelhardt, B. Grüning, R. Backofen, P. F. Stadler, Recent advances in RNA folding, J. Biotechnol., 261 (2017), 97–104. https://doi.org/10.1016/j.jbiotec.2017.07.007 doi: 10.1016/j.jbiotec.2017.07.007
    [10] K. Zhang, D. Shasha, Simple fast algorithms for the editing distance between trees and related problems, SIAM J. Comput., 18 (1989), 1245–1262. https://doi.org/10.1137/0218082 doi: 10.1137/0218082
    [11] M. Quadrini, L. Tesei, E. Merelli, An algebraic language for RNA pseudoknots comparison, BMC Bioinformatics, 20 (2019), 16. https://doi.org/10.1186/s12859-019-2689-5. doi: 10.1186/s12859-019-2689-5
    [12] F. Wang, T. Akutsu, T. Mori, Comparison of pseudoknotted RNA secondary structures by topological centroid identification and tree edit distance. J. Comput. Biol., 27 (2020), 1443–1451. https://doi.org/10.1089/cmb.2019.0512 doi: 10.1089/cmb.2019.0512
    [13] P. A. Evans, Algorithms and complexity for annotated sequence analysis, Ph. D Thesis, Canada: University of Victoria, 1999.
    [14] L. Yang, Y. Liu, X. Hu, P. Wang, X. Li, J. Wu, Graph-based analysis of RNA secondary structure similarity comparison, Complexity, 2021 (2021), 8841822. https://doi.org/10.1155/2021/8841822 doi: 10.1155/2021/8841822
    [15] J. Guo, Exact algorithms for the longest common subsequence problem for arc annotated sequences, Master's Thesis, Universitat Tubingen, 2002
    [16] G. Lin, Z. Z. Chen, T. Jiang, J. Wen, The longest common subsequence problem for sequences with nested arc annotations, J. Comput. Syst. Sci., 65 (2002), 465–480. https://doi.org/10.1016/S0022-0000(02)00004-1 doi: 10.1016/S0022-0000(02)00004-1
    [17] T. Jiang, G. Lin, B. Ma, K. Zhang, The longest common subsequence problem for arc-annotated sequences, J. Discrete Algorithms, 2 (2004), 257–270. https://doi.org/10.1016/S1570-8667(03)00080-7 doi: 10.1016/S1570-8667(03)00080-7
    [18] T. F. Smith, M. S. Waterman, Identification of common molecular subsequences, J. Mol. Biol., 147 (1981), 195–197. https://doi.org/10.1016/0022-2836(81)90087-5 doi: 10.1016/0022-2836(81)90087-5
    [19] C. Blum, M. Djukanovic, A. Santini, H. Jiang, C. M. Li, F. Manyà, et al., Solving longest common subsequence problems via a transformation to the maximum clique problem, Comput. Oper. Res., 125 (2021), 105089. https://doi.org/10.1016/j.cor.2020.105089 doi: 10.1016/j.cor.2020.105089
    [20] J. Gramm, J. Guo, R. Niedermeier, Pattern matching for arc-annotated sequences, In: Foundations of software technology and theoretical computer science, Berlin, Heidelberg: Springer, 2002. https://doi.org/10.1007/3-540-36206-1_17
    [21] IBM, CPLEX Optimization Studio V12.8.0, Available from: https://www.ibm.com/support/pages/cplex-optimization-studio-v128.
    [22] G. Blelloch. Prefix sums and their applications. In: Synthesis of parallel algorithms, 1990. Available from: http://shelf2.library.cmu.edu/Tech/23445461.
    [23] H. Bahig, K. A. Fathy, An improved parallel prefix sums algorithm, Parallel Processing Lett., 32 (2022), 2250008. https://doi.org/10.1142/S0129626422500086 doi: 10.1142/S0129626422500086
    [24] R. Shikder, P. Thulasiraman, P. Irani, P. Hu, An OpenMP-based tool for finding longest common subsequence in bioinformatics, BMC Res. Notes, 12 (2019), 220. https://doi.org/10.1186/s13104-019-4256-6 doi: 10.1186/s13104-019-4256-6
    [25] M. Crochemore, C. S. Iliopoulos, Y. J. Pinzon, J. F. Reid, A fast and practical bit-vector algorithm for the longest common subsequence problem, Inform. Processing Lett., 80 (2001), 279–285. https://doi.org/10.1016/S0020-0190(01)00182-X doi: 10.1016/S0020-0190(01)00182-X
    [26] M. Andronescu, V. Bereg, H. H. Hoos, A. Condon, RNA STRAND: The RNA secondary structure and statistical analysis database, BMC Bioinformatics, 9 (2008), 340. https://doi.org/10.1186/1471-2105-9-340 doi: 10.1186/1471-2105-9-340
    [27] CRW2: Comparative RNA Web-2. Available from: https://crw2-comparative-rna-web.org/.
    [28] R. F. Woolson, Wilcoxon signed‐rank test, Wiley encyclopedia of clinical trials, 2008. https://doi.org/10.1002/9780471462422.eoct979
  • math-09-05-550-supplementary.xlsx
  • Reader Comments
  • © 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(630) PDF downloads(79) Cited by(0)

Article outline

Figures and Tables

Figures(4)  /  Tables(5)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog