In synthetic biology, it is a challenge to increase the production of target proteins by maximizing their expression levels. In order to augment expression levels, we need to focus on both homologous recombination and codon adaptation, which are estimated by three objective functions, namely HD (Hamming distance), LRCS (length of repeated or common substring) and CAI (codon adaptation index). Optimizing these objective functions simultaneously becomes a multi-objective optimization problem. The aim is to find satisfying solutions that have high codon adaptation and a low incidence of homologous recombination. However, obtaining satisfactory solutions requires calculating the objective functions multiple times with many cycles and solutions. In this paper, we propose an approach to accelerate the method of designing a set of CDSs (CoDing sequences) based on NSGA-II (non-dominated sorting genetic algorithm II) on NVIDIA GPUs. The implementation accelerated by GPUs improves overall performance by 187.5$ \times $ using $ 100 $ cycles and $ 128 $ solutions. Our implementation allows us to use larger solutions and more cycles, leading to outstanding solution quality. The improved implementation provides much better solutions in a similar amount of time compared to other available methods by 1.22$ \times $ improvements in hypervolume. Furthermore, our approach on GPUs also suggests how to efficiently utilize the latest computational resources in bioinformatics. Finally, we discuss the impacts of the number of cycles and the number of solutions on designing a set of CDSs.
Citation: Donghyeon Kim, Jinsung Kim. Optimization of designing multiple genes encoding the same protein based on NSGA-II for efficient execution on GPUs[J]. Electronic Research Archive, 2023, 31(9): 5313-5339. doi: 10.3934/era.2023270
In synthetic biology, it is a challenge to increase the production of target proteins by maximizing their expression levels. In order to augment expression levels, we need to focus on both homologous recombination and codon adaptation, which are estimated by three objective functions, namely HD (Hamming distance), LRCS (length of repeated or common substring) and CAI (codon adaptation index). Optimizing these objective functions simultaneously becomes a multi-objective optimization problem. The aim is to find satisfying solutions that have high codon adaptation and a low incidence of homologous recombination. However, obtaining satisfactory solutions requires calculating the objective functions multiple times with many cycles and solutions. In this paper, we propose an approach to accelerate the method of designing a set of CDSs (CoDing sequences) based on NSGA-II (non-dominated sorting genetic algorithm II) on NVIDIA GPUs. The implementation accelerated by GPUs improves overall performance by 187.5$ \times $ using $ 100 $ cycles and $ 128 $ solutions. Our implementation allows us to use larger solutions and more cycles, leading to outstanding solution quality. The improved implementation provides much better solutions in a similar amount of time compared to other available methods by 1.22$ \times $ improvements in hypervolume. Furthermore, our approach on GPUs also suggests how to efficiently utilize the latest computational resources in bioinformatics. Finally, we discuss the impacts of the number of cycles and the number of solutions on designing a set of CDSs.
[1] | S. Fields, O. Song, A novel genetic system to detect protein–protein interactions, Nature, 340 (1989), 245–246. https://doi.org/10.1038/340245a0 doi: 10.1038/340245a0 |
[2] | S. Varambally, S. M. Dhanasekaran, M. Zhou, T. R. Barrette, C. Kumar-Sinha, M. G. Sanda, et al., The polycomb group protein ezh2 is involved in progression of prostate cancer, Nature, 419 (2002), 624–629. https://doi.org/10.1038/nature01075 doi: 10.1038/nature01075 |
[3] | G. Blander, L. Guarente, The sir2 family of protein deacetylases, Annu. Rev. Biochem., 73 (2004), 417–435. https://doi.org/10.1146/annurev.biochem.73.011303.073651 doi: 10.1146/annurev.biochem.73.011303.073651 |
[4] | S. P. Kaur, V. Gupta, Covid-19 vaccine: A comprehensive status report, Virus Res., 288 (2020), 198114. https://doi.org/10.1016/j.virusres.2020.198114 doi: 10.1016/j.virusres.2020.198114 |
[5] | M. Ahmad, M. Hirz, H. Pichler, H. Schwab, Protein expression in pichia pastoris: Recent achievements and perspectives for heterologous protein production, Appl. Microbiol. Biotechnol., 98 (2014), 5301–5317. https://doi.org/10.1007/s00253-014-5732-5 doi: 10.1007/s00253-014-5732-5 |
[6] | D. Fouque, K. Kalantar-Zadeh, J. Kopple, N. Cano, P. Chauveau, L. Cuppari, et al., A proposed nomenclature and diagnostic criteria for protein–energy wasting in acute and chronic kidney disease, Kidney Int., 73 (2008), 391–398. https://doi.org/10.1038/sj.ki.5002585 doi: 10.1038/sj.ki.5002585 |
[7] | A. D. Bandaranayake, S. C. Almo, Recent advances in mammalian protein production, FEBS Lett., 588 (2014), 253–260. |
[8] | J. Dehghani, A. Movafeghi, E. Mathieu-Rivet, N. Mati-Baouche, S. Calbo, P. Lerouge, et al., Microalgae as an efficient vehicle for the production and targeted delivery of therapeutic glycoproteins against sars-cov-2 variants, Marine Drugs, 20 (2022), 657. https://doi.org/10.3390/md20110657 doi: 10.3390/md20110657 |
[9] | S. C. Spohner, H. Müller, H. Quitmann, P. Czermak, Expression of enzymes for the usage in food and feed industry with pichia pastoris, J. Biotechnol., 202 (2015), 118–134. https://doi.org/10.1016/j.jbiotec.2015.01.027 doi: 10.1016/j.jbiotec.2015.01.027 |
[10] | A. Haldimann, B. L. Wanner, Conditional-replication, integration, excision, and retrieval plasmid-host systems for gene structure-function studies of bacteria, J. Bacteriol., 183 (2001), 6384–6393. |
[11] | P. Gu, F. Yang, T. Su, Q. Wang, Q. Liang, Q. Qi, A rapid and reliable strategy for chromosomal integration of gene (s) with multiple copies, Sci. Rep., 5 (2015), 1–9. https://doi.org/10.1038/srep09684 doi: 10.1038/srep09684 |
[12] | C. A. Scorer, J. J. Clare, W. R. McCombie, M. A. Romanos, K. Sreekrishna, Rapid selection using g418 of high copy number transformants of pichia pastoris for high–level foreign gene expression, Nat. Biotechnol., 12 (1994), 181–184. https://doi.org/10.1038/nbt0294-181 doi: 10.1038/nbt0294-181 |
[13] | K. E. Tyo, P. K. Ajikumar, G. Stephanopoulos, Stabilized gene duplication enables long-term selection-free heterologous pathway expression, Nat. Biotechnol., 27 (2009), 760–765. https://doi.org/10.1038/nbt.1555 doi: 10.1038/nbt.1555 |
[14] | G. Terai, S. Kamegai, A. Taneda, K. Asai, Evolutionary design of multiple genes encoding the same protein, Bioinformatics, 33 (2017), 1613–1620. https://doi.org/10.1093/bioinformatics/btx030 doi: 10.1093/bioinformatics/btx030 |
[15] | A. Vassileva, D. A. Chugh, S. Swaminathan, N. Khanna, Expression of hepatitis b surface antigen in the methylotrophic yeast pichia pastoris using the gap promoter, J. Biotechnol., 88 (2001), 21–35. https://doi.org/10.1016/S0168-1656(01)00254-1 doi: 10.1016/S0168-1656(01)00254-1 |
[16] | R. Aw, K. M. Polizzi, Can too many copies spoil the broth?, Microbial cell factories, 12 (2013), 1–9. https://doi.org/10.1186/1475-2859-12-128 doi: 10.1186/1475-2859-12-128 |
[17] | J. M. Buerstedde, N. Lowndes, D. G. Schatz, Induction of homologous recombination between sequence repeats by the activation induced cytidine deaminase (aid) protein, Elife, 3 (2014), e03110. https://doi.org/10.7554/eLife.03110 doi: 10.7554/eLife.03110 |
[18] | J. Jurka, P. Klonowski, V. Dagman, P. Pelton, Censor–a program for identification and elimination of repetitive elements from dna sequences, Comput. Chem., 20 (1996), 119–121. https://doi.org/10.1016/S0097-8485(96)80013-1 doi: 10.1016/S0097-8485(96)80013-1 |
[19] | J. Athey, A. Alexaki, E. Osipova, A. Rostovtsev, L. V. Santana-Quintero, U. Katneni, et al., A new and updated resource for codon usage tables, BMC Bioinf., 18 (2017), 1–10. https://doi.org/10.1186/s12859-017-1793-7 doi: 10.1186/s12859-017-1793-7 |
[20] | J. M. Comeron, M. Aguadé, An evaluation of measures of synonymous codon usage bias, J. Mol. Evol., 47 (1998), 268–274. https://doi.org/10.1007/PL00006384 doi: 10.1007/PL00006384 |
[21] | M. Gouy, C. Gautier, Codon usage in bacteria: correlation with gene expressivity, Nucleic Acids Res., 10 (1982), 7055–7074. https://doi.org/10.1093/nar/10.22.7055 doi: 10.1093/nar/10.22.7055 |
[22] | T. Ikemura, Correlation between the abundance of escherichia coli transfer rnas and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the E. coli translational system, J. Mol. Biol., 151 (1981), 389–409. https://doi.org/10.1016/0022-2836(81)90003-6 doi: 10.1016/0022-2836(81)90003-6 |
[23] | P. M. Sharp, W. H. Li, The codon adaptation index-a measure of directional synonymous codon usage bias, and its potential applications, Nucleic Acids Res., 15 (1987), 1281–1295. https://doi.org/10.1093/nar/15.3.1281 doi: 10.1093/nar/15.3.1281 |
[24] | K. Deb, A. Pratap, S. Agarwal, T. Meyarivan, A fast and elitist multiobjective genetic algorithm: Nsga-ii, IEEE Trans. Evol. Comput., 6 (2002), 182–197. https://doi.org/10.1109/4235.996017 doi: 10.1109/4235.996017 |
[25] | B. Gonzalez-Sanchez, M. A. Vega-Rodríguez, S. Santander-Jiménez, J. M. Granado-Criado, Multi-objective artificial bee colony for designing multiple genes encoding the same protein, Appl. Soft Comput., 74 (2019), 90–98. https://doi.org/10.1016/j.asoc.2018.10.023 doi: 10.1016/j.asoc.2018.10.023 |
[26] | L. Dagum, R. Menon, Openmp: An industry standard api for shared-memory programming, IEEE Comput. Sci. Eng., 5 (1998), 46–55. https://doi.org/10.1109/99.660313 doi: 10.1109/99.660313 |
[27] | Y. Zhou, Y. Tan, Gpu-based parallel multi-objective particle swarm optimization, Int. J. Artif. Intell., 7 (2011), 125–141. |
[28] | B. Gonzalez-Sanchez, M. A. Vega-Rodríguez, S. Santander-Jiménez, Parallel multi-objective optimization approaches for protein encoding, J. Supercomput., 1–31. https://doi.org/10.1007/s11227-021-04073-z |
[29] | F. C. Holstege, E. G. Jennings, J. J. Wyrick, T. I. Lee, C. J. Hengartner, M. R. Green, et al., Dissecting the regulatory circuitry of a eukaryotic genome, Cell, 95 (1998), 717–728. https://doi.org/10.1016/S0092-8674(00)81641-4 doi: 10.1016/S0092-8674(00)81641-4 |
[30] | Z. Jia, M. Maggioni, B. Staiger, D. P. Scarpazza, Dissecting the nvidia volta gpu architecture via microbenchmarking, preprint arXiv: 1804.06826. |
[31] | T. U. Consortium, UniProt: The universal protein knowledgebase in 2023, Nucleic Acids Res., 51 (2023), D523–D531. https://doi.org/10.1093/nar/gkac1052 doi: 10.1093/nar/gkac1052 |
[32] | J. X. Chin, B. K. S. Chung, D. Y. Lee, Codon optimization online (cool): A web-based multi-objective optimization platform for synthetic gene design, Bioinformatics, 30 (2014), 2210–2212. https://doi.org/10.1093/bioinformatics/btu192 doi: 10.1093/bioinformatics/btu192 |
[33] | J. C. Guimaraes, M. Rocha, A. P. Arkin, G. Cambray, D-tailor: Automated analysis and design of dna sequences, Bioinformatics, 30 (2014), 1087–1094. https://doi.org/10.1093/bioinformatics/btt742 doi: 10.1093/bioinformatics/btt742 |
[34] | P. Puigbo, E. Guzmán, A. Romeu and S. Garcia-Vallve, Optimizer: a web server for optimizing the codon usage of dna sequences, Nucleic Acids Res., 35 (2007), W126–W131. https://doi.org/10.1093/nar/gkm219 doi: 10.1093/nar/gkm219 |
[35] | B. Gonzalez-Sanchez, M. A. Vega-Rodríguez, S. Santander-Jiménez, A multi-objective butterfly optimization algorithm for protein encoding, Appl. Soft Comput., 139 (2023), 110269. https://doi.org/10.1016/j.asoc.2023.110269 doi: 10.1016/j.asoc.2023.110269 |
[36] | M. V. Díaz-Galián, M. A. Vega-Rodríguez, Many-objective approach based on problem-aware mutation operators for protein encoding, Inf. Sci., 613 (2022), 376–400. |
[37] | K. Deb, H. Jain, An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, part i: solving problems with box constraints, IEEE Trans. Evol. Comput., 18 (2013), 577–601. 10.1109/TEVC.2013.2281535 doi: 10.1109/TEVC.2013.2281535 |
[38] | I. Das, J. E. Dennis, Normal-boundary intersection: A new method for generating the pareto surface in nonlinear multicriteria optimization problems, SIAM J. Optim., 8 (1998), 631–657. |
[39] | M. A. Dulebenets, An adaptive polyploid memetic algorithm for scheduling trucks at a cross-docking terminal, Inf. Sci., 565 (2021), 390–421. https://doi.org/10.1016/j.ins.2021.02.039 doi: 10.1016/j.ins.2021.02.039 |
[40] | J. Pasha, A. L. Nwodu, A. M. Fathollahi-Fard, G. Tian, Z. Li, H. Wang, et al., Exact and metaheuristic algorithms for the vehicle routing problem with a factory-in-a-box in multi-objective settings, Adv. Eng. Inf., 52 (2022), 101623. https://doi.org/10.1016/j.aei.2022.101623 doi: 10.1016/j.aei.2022.101623 |
[41] | H. Gholizadeh, H. Fazlollahtabar, A. M. Fathollahi-Fard, M. A. Dulebenets, Preventive maintenance for the flexible flowshop scheduling under uncertainty: A waste-to-energy system, Environ. Sci. Pollut. Res., 1–20. https://doi.org/10.1007/s11356-021-16234-x |
[42] | M. A. Dulebenets, M. Kavoosi, O. Abioye, J. Pasha, A self-adaptive evolutionary algorithm for the berth scheduling problem: Towards efficient parameter control, Algorithms, 11 (2018), 100. https://doi.org/10.3390/a11070100 doi: 10.3390/a11070100 |
[43] | H. Zhao, C. Zhang, An online-learning-based evolutionary many-objective algorithm, Inf. Sci., 509 (2020), 1–21. https://doi.org/10.1016/j.ins.2019.08.069 doi: 10.1016/j.ins.2019.08.069 |