Maximizing protein expression levels poses a major challenge in bioengineering. To increase protein expression levels, numerous factors, including codon bias, codon context bias, hidden stop codons, homologous recombination, suitable guanine-cytosine ratio, and hairpin loop structure, are crucial and quantified by six objective functions: CAI, CPB, HSC, HD, GC3, and SL. Optimizing these six objectives simultaneously constitutes a multi-objective optimization problem, aiming to identify the favorable Pareto solutions rather than a singular optimal solution. However, achieving satisfactory solutions requires numerous cycles and solutions, thus leading to a large number of functional evaluations. While there are frameworks for multi-objective optimization problems, they often lack efficient support for objective function computation in protein encoding. In this paper, we proposed a method to design a set of coding sequences (CDSs) based on non-dominated sorting genetic algorithm III (NSGA-III), accelerated using NVIDIA graphical processing units (GPUs). Experimental results indicated that our method is 15,454 times faster than the Pymoo framework and is evaluated using 100 solutions and 100 cycles. Since our GPU implementation facilitated the use of larger solutions and more cycles, we were able to design a superior set of CDSs by increasing solutions to 400 and cycles to 12,800. In addition, our NSGA-III-based method consistently surpassed the NSGA-II approach when the number of cycles exceeded 3200 by utilizing 100 solutions. Finally, we observed that a gradual reduction of the mutation probability as the number of cycles increased yielded better quality results than maintaining a fixed mutation probability.
Citation: Donghyeon Kim, Jinsung Kim. GPU-accelerated non-dominated sorting genetic algorithm III for maximizing protein production[J]. Electronic Research Archive, 2024, 32(4): 2514-2540. doi: 10.3934/era.2024116
Maximizing protein expression levels poses a major challenge in bioengineering. To increase protein expression levels, numerous factors, including codon bias, codon context bias, hidden stop codons, homologous recombination, suitable guanine-cytosine ratio, and hairpin loop structure, are crucial and quantified by six objective functions: CAI, CPB, HSC, HD, GC3, and SL. Optimizing these six objectives simultaneously constitutes a multi-objective optimization problem, aiming to identify the favorable Pareto solutions rather than a singular optimal solution. However, achieving satisfactory solutions requires numerous cycles and solutions, thus leading to a large number of functional evaluations. While there are frameworks for multi-objective optimization problems, they often lack efficient support for objective function computation in protein encoding. In this paper, we proposed a method to design a set of coding sequences (CDSs) based on non-dominated sorting genetic algorithm III (NSGA-III), accelerated using NVIDIA graphical processing units (GPUs). Experimental results indicated that our method is 15,454 times faster than the Pymoo framework and is evaluated using 100 solutions and 100 cycles. Since our GPU implementation facilitated the use of larger solutions and more cycles, we were able to design a superior set of CDSs by increasing solutions to 400 and cycles to 12,800. In addition, our NSGA-III-based method consistently surpassed the NSGA-II approach when the number of cycles exceeded 3200 by utilizing 100 solutions. Finally, we observed that a gradual reduction of the mutation probability as the number of cycles increased yielded better quality results than maintaining a fixed mutation probability.
[1] | A. Zhou, B. Qu, H. Li, S. Zhao, P. N. Suganthan, Q. Zhang, Multiobjective evolutionary algorithms: a survey of the state of the art, Swarm Evol. Comput., 1 (2011), 32–49. https://doi.org/10.1016/j.swevo.2011.03.001 doi: 10.1016/j.swevo.2011.03.001 |
[2] | N. Gunantara, A review of multi-objective optimization: methods and its applications, Cogent Eng., 5 (2018), 1502242. https://doi.org/10.1080/23311916.2018.1502242 doi: 10.1080/23311916.2018.1502242 |
[3] | P. Eskelinen, K. Miettinen, Trade-off analysis approach for interactive nonlinear multiobjective optimization, OR Spectrum, 34 (2012), 803–816. https://doi.org/10.1007/s00291-011-0266-z doi: 10.1007/s00291-011-0266-z |
[4] | H. Fang, M. Rais-Rohani, Z. Liu, M. F. Horstemeyer, A comparative study of metamodeling methods for multiobjective crashworthiness optimization, Comput. Struct., 83 (2005), 2121–2136. https://doi.org/10.1016/j.compstruc.2005.02.025 doi: 10.1016/j.compstruc.2005.02.025 |
[5] | F. Di Pierro, S. Khu, D. Savić, L. Berardi, Efficient multi-objective optimal design of water distribution networks on a budget of simulations using hybrid algorithms, Environ. Modell. Software, 24 (2009), 202–213. https://doi.org/10.1016/j.envsoft.2008.06.008 doi: 10.1016/j.envsoft.2008.06.008 |
[6] | S. Fields, O. Song, A novel genetic system to detect protein–protein interactions, Nature, 340 (1989), 245–246. https://doi.org/10.1038/340245a0 doi: 10.1038/340245a0 |
[7] | S. Varambally, S. M. Dhanasekaran, M. Zhou, T. R. Barrette, C. Kumar-Sinha, M. G. Sanda, et al., The polycomb group protein ezh2 is involved in progression of prostate cancer, Nature, 419 (2002), 624–629. https://doi.org/10.1038/nature01075 doi: 10.1038/nature01075 |
[8] | G. Blander, L. Guarente, The sir2 family of protein deacetylases, Annu. Rev. Biochem., 73 (2004), 417–435. https://doi.org/10.1146/annurev.biochem.73.011303.073651 doi: 10.1146/annurev.biochem.73.011303.073651 |
[9] | S. P. Kaur, V. Gupta, Covid-19 vaccine: a comprehensive status report, Virus Res., 288 (2020), 198114. https://doi.org/10.1016/j.virusres.2020.198114 doi: 10.1016/j.virusres.2020.198114 |
[10] | M. Ahmad, M. Hirz, H. Pichler, H. Schwab, Protein expression in pichia pastoris: recent achievements and perspectives for heterologous protein production, Appl. Microbiol. Biotechnol., 98 (2014), 5301–5317. https://doi.org/10.1007/s00253-014-5732-5 doi: 10.1007/s00253-014-5732-5 |
[11] | D. Fouque, K. Kalantar-Zadeh, J. Kopple, N. Cano, P. Chauveau, L. Cuppari, et al., A proposed nomenclature and diagnostic criteria for protein–energy wasting in acute and chronic kidney disease, Kidney Int., 73 (2008), 391–398. https://doi.org/10.1038/sj.ki.5002585 doi: 10.1038/sj.ki.5002585 |
[12] | J. Dehghani, A. Movafeghi, E. Mathieu-Rivet, N. Mati-Baouche, S. Calbo, P. Lerouge, et al., Microalgae as an efficient vehicle for the production and targeted delivery of therapeutic glycoproteins against sars-cov-2 variants, Mar. Drugs, 20 (2022), 657. https://doi.org/10.3390/md20110657 doi: 10.3390/md20110657 |
[13] | S. Huleani, M. R. Roberts, L. Beales, E. H. Papaioannou, Escherichia coli as an antibody expression host for the production of diagnostic proteins: significance and expression, Crit. Rev. Biotechnol., 42 (2022), 756–773. https://doi.org/10.1080/07388551.2021.1967871 doi: 10.1080/07388551.2021.1967871 |
[14] | P. Gu, F. Yang, T. Su, Q. Wang, Q. Liang, Q. Qi, A rapid and reliable strategy for chromosomal integration of gene(s) with multiple copies, Sci. Rep., 5 (2015), 9684. https://doi.org/10.1038/srep09684 doi: 10.1038/srep09684 |
[15] | C. A. Scorer, J. J. Clare, W. R. McCombie, M. A. Romanos, K. Sreekrishna, Rapid selection using g418 of high copy number transformants of pichia pastoris for high–level foreign gene expression, Nat. Biotechnol., 12 (1994), 181–184. https://doi.org/10.1038/nbt0294-181 doi: 10.1038/nbt0294-181 |
[16] | K. Tyo, P. K. Ajikumar, G. Stephanopoulos, Stabilized gene duplication enables long-term selection-free heterologous pathway expression, Nat. Biotechnol., 27 (2009), 760–765. https://doi.org/10.1038/nbt.1555 doi: 10.1038/nbt.1555 |
[17] | R. Aw, K. M. Polizzi, Can too many copies spoil the broth? Microb. Cell Fact., 12 (2013), 128. https://doi.org/10.1186/1475-2859-12-128 |
[18] | J. Buerstedde, N. Lowndes, D. G. Schatz, Induction of homologous recombination between sequence repeats by the activation induced cytidine deaminase (aid) protein, Elife, 3 (2014), e03110. https://doi.org/10.7554/eLife.03110 doi: 10.7554/eLife.03110 |
[19] | G. Terai, S. Kamegai, A. Taneda, K. Asai, Evolutionary design of multiple genes encoding the same protein, Bioinformatics, 33 (2017), 1613–1620. https://doi.org/10.1093/bioinformatics/btx030 doi: 10.1093/bioinformatics/btx030 |
[20] | S. T. Parvathy, V. Udayasuriyan, V. Bhadana, Codon usage bias, Mol. Biol. Rep., 49 (2022), 539–565. https://doi.org/10.1007/s11033-021-06749-4 |
[21] | J. Athey, A. Alexaki, E. Osipova, A. Rostovtsev, L. V. Santana-Quintero, U. Katneni, et al., A new and updated resource for codon usage tables, BMC Bioinf., 18 (2017), 391. https://doi.org/10.1186/s12859-017-1793-7 doi: 10.1186/s12859-017-1793-7 |
[22] | J. M. Comeron, M. Aguadé, An evaluation of measures of synonymous codon usage bias, J. Mol. Evol., 47 (1998), 268–274. https://doi.org/10.1007/PL00006384 doi: 10.1007/PL00006384 |
[23] | M. Gouy, C. Gautier, Codon usage in bacteria: correlation with gene expressivity, Nucleic Acids Res., 10 (1982), 7055–7074. https://doi.org/10.1093/nar/10.22.7055 doi: 10.1093/nar/10.22.7055 |
[24] | P. M. Sharp, W. Li, The codon adaptation index-a measure of directional synonymous codon usage bias, and its potential applications, Nucleic Acids Res., 15 (1987), 1281–1295. https://doi.org/10.1093/nar/15.3.1281 doi: 10.1093/nar/15.3.1281 |
[25] | G. A. Gutman, G. W. Hatfield, Nonrandom utilization of codon pairs in escherichia coli, PNAS, 86 (1989), 3699–3703. https://doi.org/10.1073/pnas.86.10.369 doi: 10.1073/pnas.86.10.369 |
[26] | A. Tats, T. Tenson, M. Remm, Preferred and avoided codon pairs in three domains of life, BMC Genomics, 9 (2008), 463. https://doi.org/10.1186/1471-2164-9-463 doi: 10.1186/1471-2164-9-463 |
[27] | M. Baeza, J. Alcaíno, S. Barahona, D. Sepúlveda, V. Cifuentes, Codon usage and codon context bias in xanthophyllomyces dendrorhous, BMC Genomics, 16 (2015), 293. https://doi.org/10.1186/s12864-015-1493-5 doi: 10.1186/s12864-015-1493-5 |
[28] | R. Prabha, D. P. Singh, S. Sinha, K. Ahmad, A. Rai, Genome-wide comparative analysis of codon usage bias and codon context patterns among cyanobacterial genomes, Mar. Geonomics, 32 (2017), 31–39. https://doi.org/10.1016/j.margen.2016.10.001 doi: 10.1016/j.margen.2016.10.001 |
[29] | J. R. Coleman, D. Papamichail, S. Skiena, B. Futcher, E. Wimmer, S. Mueller, Virus attenuation by genome-scale changes in codon pair bias, Science, 320 (2008), 1784–1787. https://doi.org/10.1126/science.1155761 doi: 10.1126/science.1155761 |
[30] | H. Seligmann, Cost minimization of ribosomal frameshifts, J. Theor. Biol., 249 (2007), 162–167. https://doi.org/10.1016/j.jtbi.2007.07.007 doi: 10.1016/j.jtbi.2007.07.007 |
[31] | H. Seligmann, D. D. Pollock, The ambush hypothesis: hidden stop codons prevent off-frame gene reading, DNA Cell Biol., 23 (2004), 701–705. https://doi.org/10.1089/dna.2004.23.701 doi: 10.1089/dna.2004.23.701 |
[32] | A. Gupta, T. R. Singh, Shift: server for hidden stops analysis in frame-shifted translation, BMC Res. Notes, 6 (2013), 68. https://doi.org/10.1186/1756-0500-6-68 doi: 10.1186/1756-0500-6-68 |
[33] | P. Svoboda, A. D. Cara, Hairpin rna: a secondary structure of primary importance, Cell. Mol. Life Sci., 63 (2006), 901–908. https://doi.org/10.1007/s00018-005-5558-5 doi: 10.1007/s00018-005-5558-5 |
[34] | C. Bao, S. Loerch, C. Ling, A. A. Korostelev, N. Grigorieff, D. N. Ermolenko, mRNA stem-loops can pause the ribosome by hindering a-site trna binding, Elife, 9 (2020), e55799. https://doi.org/10.7554/eLife.55799 doi: 10.7554/eLife.55799 |
[35] | M. V. Díaz-Galián, M. A. Vega-Rodríguez, Many-objective approach based on problem-aware mutation operators for protein encoding, Inf. Sci., 613 (2022), 376–400. https://doi.org/10.1016/j.ins.2022.09.048 doi: 10.1016/j.ins.2022.09.048 |
[36] | A. Watts, S. Sankaranarayanan, A. Watts, R. K. Raipuria, Optimizing protein expression in heterologous system: strategies and tools, Meta Gene, 29 (2021), 100899. https://doi.org/10.1016/j.mgene.2021.100899 doi: 10.1016/j.mgene.2021.100899 |
[37] | B. Gonzalez-Sanchez, M. A. Vega-Rodríguez, S. Santander-Jiménez, A multi-objective butterfly optimization algorithm for protein encoding, Appl. Soft Comput., 139 (2023), 110269. https://doi.org/10.1016/j.asoc.2023.110269 doi: 10.1016/j.asoc.2023.110269 |
[38] | K. Deb, H. Jain, An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, part I: solving problems with box constraints, IEEE Trans. Evol. Comput., 18 (2013), 577–601. https://doi.org/10.1109/TEVC.2013.2281535 doi: 10.1109/TEVC.2013.2281535 |
[39] | K. Deb, A. Pratap, S. Agarwal, T. Meyarivan, A fast and elitist multiobjective genetic algorithm: NSGA-II, IEEE Trans. Evol. Comput., 6 (2002), 182–197. https://doi.org/10.1109/4235.996017 doi: 10.1109/4235.996017 |
[40] | A. Razmi, M. Rahbar, M. Bemanian, Pca-ann integrated nsga-iii framework for dormitory building design optimization: energy efficiency, daylight, and thermal comfort, Appl. Energy, 305 (2022), 117828. https://doi.org/10.1016/j.apenergy.2021.117828 doi: 10.1016/j.apenergy.2021.117828 |
[41] | I. Khettabi, M. A. Boutiche, L. Benyoucef, NSGA-II vs NSGA-III for the sustainable multi-objective process plan generation in a reconfigurable manufacturing environment, IFAC-PapersOnLine, 54 (2021), 683–688. https://doi.org/10.1016/j.ifacol.2021.08.180 doi: 10.1016/j.ifacol.2021.08.180 |
[42] | X. Li, H. Lv, D. Zeng, Q. Zhang, An improved multi-objective trajectory planning algorithm for kiwifruit harvesting manipulator, IEEE Access, 11 (2023), 65689–65699. https://doi.org/10.1109/ACCESS.2023.3289207 doi: 10.1109/ACCESS.2023.3289207 |
[43] | H. Ishibuchi, R. Imada, Y. Setoguchi, Y. Nojima, Performance comparison of nsga-ii and nsga-iii on various many-objective test problems, in 2016 IEEE Congress on Evolutionary Computation (CEC), IEEE, (2016), 3045–3052. https://doi.org/10.1109/CEC.2016.7744174 |
[44] | S. Wang, Y. Wang, Y. Wang, Z. Wang, Comparison of multi-objective evolutionary algorithms applied to watershed management problem, J. Environ. Manage., 324 (2022), 116255. https://doi.org/10.1016/j.jenvman.2022.116255 doi: 10.1016/j.jenvman.2022.116255 |
[45] | J. Blank, K. Deb, Pymoo: multi-objective optimization in python, IEEE Access, 8 (2020), 89497–89509. https://doi.org/10.1109/ACCESS.2020.2990567 doi: 10.1109/ACCESS.2020.2990567 |
[46] | B. Gonzalez-Sanchez, M. A. Vega-Rodríguez, S. Santander-Jiménez, J. M. Granado-Criado, Multi-objective artificial bee colony for designing multiple genes encoding the same protein, Appl. Soft Comput., 74 (2019), 90–98. https://doi.org/10.1016/j.asoc.2018.10.023 doi: 10.1016/j.asoc.2018.10.023 |
[47] | B. Gonzalez-Sanchez, M. A. Vega-Rodríguez, S. Santander-Jiménez, Parallel multi-objective optimization approaches for protein encoding, J. Supercomput., 78 (2022), 5118–5148. https://doi.org/10.1007/s11227-021-04073-z doi: 10.1007/s11227-021-04073-z |
[48] | B. Gonzalez-Sanchez, M. A. Vega-Rodriguez, S. Santander-Jimenez, Multi-objective protein encoding: redefinition of the problem, new problem-aware operators, and approach based on variable neighborhood search, Inf. Sci., 500 (2019), 173–189. https://doi.org/10.1016/j.ins.2019.05.088 doi: 10.1016/j.ins.2019.05.088 |
[49] | B. Gonzalez-Sanchez, M. A. Vega-Rodriguez, S. Santander-Jimenez, Multi-objective memetic meta-heuristic algorithm for encoding the same protein with multiple genes, Expert Syst. Appl., 136 (2019), 83–93. https://doi.org/10.1016/j.eswa.2019.06.031 doi: 10.1016/j.eswa.2019.06.031 |
[50] | D. Karaboga, B. Basturk, A powerful and efficient algorithm for numerical function optimization: artificial bee colony (abc) algorithm, J. Global Optim., 39 (2007), 459–471. https://doi.org/10.1007/s10898-007-9149-x doi: 10.1007/s10898-007-9149-x |
[51] | P. Hansen, N. Mladenović, J. A. Moreno Perez, Variable neighbourhood search: methods and applications, Ann. Oper. Res., 175 (2010), 367–407. https://doi.org/10.1007/s10479-009-0657-6 doi: 10.1007/s10479-009-0657-6 |
[52] | N. Mladenović, P. Hansen, Variable neighborhood search, Comput. Oper. Res., 24 (1997), 1097–1100. https://doi.org/10.1016/S0305-0548(97)00031-2 |
[53] | E. Elbeltagi, T. Hegazy, D. Grierson, A modified shuffled frog-leaping optimization algorithm: applications to project management, Struct. Infrastruct. Eng., 3 (2007), 53–60. https://doi.org/10.1080/15732470500254535 doi: 10.1080/15732470500254535 |
[54] | I. Das, J. E. Dennis, Normal-boundary intersection: a new method for generating the pareto surface in nonlinear multicriteria optimization problems, SIAM J. Optim., 8 (1998), 631–657. https://doi.org/10.1137/S1052623496307510 doi: 10.1137/S1052623496307510 |
[55] | S. Arora, S. Singh, Butterfly optimization algorithm: a novel approach for global optimization, Soft Comput., 23 (2019), 715–734. https://doi.org/10.1007/s00500-018-3102-4 doi: 10.1007/s00500-018-3102-4 |
[56] | K. Deb, E. Goodman, C. A. C. Coello, K. Klamroth, K. Miettinen, S. Mostaghim, et al., Evolutionary Multi-Criterion Optimization: 10th International Conference, EMO 2019, East Lansing, MI, USA, March 10-13, 2019, Proceedings, Springer, 11411 (2019). |
[57] | D. D. Holcomb, A. Alexaki, U. Katneni, C. Kimchi-Sarfaty, The kazusa codon usage database, cocoputs, and the value of up-to-date codon usage statistics, Infect., Genet. Evol., 73 (2019), 266–268. https://doi.org/10.1016/j.meegid.2019.05.010 doi: 10.1016/j.meegid.2019.05.010 |
[58] | Kazusa DNA Research Institute, Saccharomyces cerevisiae gc contents, 2023. Available from: https://www.kazusa.or.jp/codon/cgi-bin/showcodon.cgi?species = 4932. |
[59] | The UniProt Consortium, UniProt: the Universal Protein Knowledgebase in 2023, Nucleic Acids Res., 51 (2023), D523–D531. https://doi.org/10.1093/nar/gkac1052 doi: 10.1093/nar/gkac1052 |
[60] | D. Kim, J. Kim, Optimization of designing multiple genes encoding the same protein based on NSGA-II for efficient execution on GPUs, Electron. Res. Arch., 31 (2023), 5313–5339. https://doi.org/10.3934/era.2023270 doi: 10.3934/era.2023270 |
[61] | G. Tzeng, J. Huang, Multiple Attribute Decision Making: Methods and Applications, CRC press, 2011. |
[62] | D. L. Church, L. Cerutti, A. Gürtler, T. Griener, A. Zelazny, S. Emler, Performance and application of 16s rrna gene cycle sequencing for routine identification of bacteria in the clinical microbiology laboratory, Clin. Microbiol. Rev., 33 (2020). https://doi.org/10.1128/CMR.00053-19 |
[63] | B. Xue, M. Zhang, W. N. Browne, X. Yao, A survey on evolutionary computation approaches to feature selection, IEEE Trans. Evol. Comput., 20 (2015), 606–626. https://doi.org/10.1109/TEVC.2015.2504420 doi: 10.1109/TEVC.2015.2504420 |
[64] | M. A. Dulebenets, An adaptive polyploid memetic algorithm for scheduling trucks at a cross-docking terminal, Inf. Sci., 565 (2021), 390–421. https://doi.org/10.1016/j.ins.2021.02.039 doi: 10.1016/j.ins.2021.02.039 |
[65] | B. Song, Z. Wang, L. Zou, An improved pso algorithm for smooth path planning of mobile robots using continuous high-degree bezier curve, Appl. Soft Comput., 100 (2021), 106960. https://doi.org/10.1016/j.asoc.2020.106960 doi: 10.1016/j.asoc.2020.106960 |