A multi-objective imperialist competitive algorithm (MOICA) for finding motifs in DNA sequences

Saeed Alirezanejad Gohardani; Mehri Bagherian; Hamidreza Vaziri; Saeed Alirezanejad Gohardani; Mehri Bagherian; Hamidreza Vaziri

doi:10.3934/mbe.2019075

Mathematical Biosciences and Engineering

2019, Volume 16, Issue 3: 1575-1596. doi: 10.3934/mbe.2019075

Previous Article Next Article

Research article

A multi-objective imperialist competitive algorithm (MOICA) for finding motifs in DNA sequences

1.
Department of Applied Mathematics, Faculty of Mathematical Science, University of Guilan, Rasht, Iran
2.
Department of Biology, Faculty of Science, University of Guilan, Rasht, Iran

Received: 22 November 2019 Accepted: 27 January 2019 Published: 26 February 2019

Motif discovery problem (MDP) is one of the well-known problems in biology which tries to find the transcription factor binding site (TFBS) in DNA sequences. In one aspect, there is not enough biological knowledge on motif sites and on the other side, the problem is NP-hard. Thus, there is not an efficient procedure capable of finding motifs in every dataset. Some algorithms use exhaustive search, which is very time-consuming for large-scale datasets. On the other side, metaheuristic procedures seem to be a good selection for finding a motif quickly that at least has some acceptable biological properties. Most of the previous methods model the problem as a single objective optimization problem; however, considering multi-objectives for modeling the problem leads to improvements in the quality of obtained motifs. Some multi-objective optimization models for MDP have tried to maximize three objectives simultaneously: Motif length, support, and similarity. In this study, the multi-objective Imperialist Competition Algorithm (ICA) is adopted for this problem as an approximation algorithm. ICA is able to simulate more exploration along the solution space, so avoids trapping into local optima. So, it promises to obtain good solutions in a reasonable time. Experimental results show that our method produces good solutions compared to well-known algorithms in the literature, according to computational and biological indicators.
- motif discovery,
- Imperialist Competitive Algorithm,
- multi-objective optimization,
- DNA sequences,
- transcription factor binding site
Citation: Saeed Alirezanejad Gohardani, Mehri Bagherian, Hamidreza Vaziri. A multi-objective imperialist competitive algorithm (MOICA) for finding motifs in DNA sequences[J]. Mathematical Biosciences and Engineering, 2019, 16(3): 1575-1596. doi: 10.3934/mbe.2019075

Related Papers:

Abstract

Motif discovery problem (MDP) is one of the well-known problems in biology which tries to find the transcription factor binding site (TFBS) in DNA sequences. In one aspect, there is not enough biological knowledge on motif sites and on the other side, the problem is NP-hard. Thus, there is not an efficient procedure capable of finding motifs in every dataset. Some algorithms use exhaustive search, which is very time-consuming for large-scale datasets. On the other side, metaheuristic procedures seem to be a good selection for finding a motif quickly that at least has some acceptable biological properties. Most of the previous methods model the problem as a single objective optimization problem; however, considering multi-objectives for modeling the problem leads to improvements in the quality of obtained motifs. Some multi-objective optimization models for MDP have tried to maximize three objectives simultaneously: Motif length, support, and similarity. In this study, the multi-objective Imperialist Competition Algorithm (ICA) is adopted for this problem as an approximation algorithm. ICA is able to simulate more exploration along the solution space, so avoids trapping into local optima. So, it promises to obtain good solutions in a reasonable time. Experimental results show that our method produces good solutions compared to well-known algorithms in the literature, according to computational and biological indicators.

References

[1]	F. Zare-Mirakabad, H. Ahrabian and M. Sadeghi, et al., Genetic algorithm for dyad pattern finding in DNA sequences, Genes Genet. Syst., 84 (2009), 81–93.
[2]	M. Li, B. Ma and L. Wang, Finding similar regions in many sequences, J. Comput. Syst. Sci., 65 (2002), 73–96.
[3]	M. F. Sagot, Spelling approximate repeated or common motifs using a suffix tree, Springer, 1998.
[4]	F. W. Glover and G. A. Kochenberger, Handbook of metaheuristics, Springer Science & Business Media, 2006.
[5]	E. Czeizler, T. Hirvola and K. Karhu, A graph-theoretical approach for motif discovery in protein sequences, IEEE/ACM Trans. Comput. Biol. Bioinf., 14 (2017), 121–130.
[6]	M. Kaya, MOGAMOD: Multi-objective genetic algorithm for motif discovery, Expert. Syst. Appl., 36 (2009), 1039–1047.
[7]	D. L. González-Álvarez, M. A. Vega-Rodríguez and Á. Rubio-Largo, Multiobjective optimization algorithms for motif discovery in DNA sequences, Genet. Program. Evolvable Mach., 16 (2015), 167–209.
[8]	C. E. Lawrence and A. A. Reilly, An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences, Proteins, 7 (1990), 41–51.
[9]	C. E. Lawrence, S. F. Altschul and M. S. Boguski, et al., Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment, Science, 262 (1993), 208–214.
[10]	T. L. Bailey and C. Elkan, Fitting a mixture model by expectation maximization to discover motifs in bipolymers, Proc. Int. Conf. Intell. Syst. Mol. Biol., 2 (1994), 28–36..
[11]	F. P. Roth, J. D. Hughes and P. W. Estep, et al., Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation, Nat. Biotechnol., 16 (1998), 939–945.
[12]	K. C. Wong, MotifHyades: Expectation maximization for de novo DNA motif pair discovery on paired sequences, Bioinformatics, 33 (2017), 3028–3035.
[13]	K. C. Wong, DNA Motif Recognition Modeling from Protein Sequences, iScience, 7 (2018), 198–211.
[14]	G. Pavesi, P. Mereghetti and G. Mauri, et al., Weeder Web: Discovery of transcription factor binding sites in a set of sequences from co-regulated genes. Nucleic Acids Res., 32 (2004), W199–W203.
[15]	E. Eskin and P. A. Pevzner, Finding composite regulatory patterns in DNA sequences, Bioinformatics, 18 (2002), S354–S363.
[16]	P. A. Evans and A. D. Smith, Toward optimal motif enumeration, Springer, 2003.
[17]	J. Serra, A. Matic and A. Karatzoglou, et al., A genetic algorithm to discover flexible motifs with support, IEEE, 2016.
[18]	N. Pisanti, A. M. Carvalho and L. Marsan, et al., RISOTTO: Fast extraction of motifs with mismatches, Springer, 2006.
[19]	G. Z. Hertz and G. D. Stormo, Identifying DNA and protein patterns with statistically significant alignments of multiple sequences, Bioinformatics, 15 (1999), 563–577.
[20]	D. L. González-Álvarez, M. A. Vega-Rodríguez and J. A. Gómez-Pulido, et al., Finding Motifs in DNA Sequences Applying a Multiobjective Artificial Bee Colony (MOABC) Algorithm, Springer, 2011.
[21]	D. L. González-Álvarez, M. A. Vega-Rodríguez and Á. Rubio-Largo, Searching for common patterns on protein sequences by means of a parallel hybrid honey-bee mating optimization algorithm, Parallel. Comput., 76 (2018), 1–17.
[22]	E. Zitzler and L. Thiele, Multiobjective evolutionary algorithms: A comparative case study and the strength Pareto approach, IEEE T. Evolut. Comput., 3 (1999), 257–271.
[23]	E. Wingender, P, Dietze and H. Karas, et al., TRANSFAC: A database on transcription factors and their DNA binding sites, Nucleic Acids Res., 24 (1996), 238–241.
[24]	D. L. González-Álvarez, M. A. Vega-Rodríguez and J. A. Gómez-Pulido, et al., Solving the motif discovery problem by using differential evolution with pareto tournaments, IEEE, 2010.
[25]	G. B. Fogel, D. G. Weekes and G. Varga, et al., Discovery of sequence motifs related to coexpression of genes using evolutionary computation, Nucleic Acids Res., 32 (2004), 3826–3835.
[26]	E. Zitzler, K. Deb and L. Thiele, Comparison of multiobjective evolutionary algorithms: Empirical results, Evolut. Comput., 8 (2000), 173–195.
[27]	E. Atashpaz-Gargari and C. Lucas, Imperialist competitive algorithm: An algorithm for optimization inspired by imperialistic competition, IEEE, 2007.
[28]	D. L. Gonzalez-Álvarez, M. A. Vega-Rodriguez and J. A. Gomez-Pulido, et al., Predicting DNA motifs by using evolutionary multiobjective optimization, IEEE. T. Syst. Man. Cy. C., 42 (2012), 913–925.
[29]	X. S. Yang, Firefly algorithms for multimodal optimization, Springer, 2009.
[30]	D. L. González-Álvarez, M. A. Vega-Rodríguez and J, A. Gómez-Pulido, et al., Applying a multiobjective gravitational search algorithm (MO-GSA) to discover motifs, Springer, 2011.
[31]	E. Zitzler, M. Laumanns and L. Thiele, SPEA2: Improving the strength Pareto evolutionary algorithm, 2001.
[32]	K. Deb, A. Pratap and S. Agarwal, et al., A fast and elitist multiobjective genetic algorithm: NSGA-II, IEEE T. Evolut. Comput., 6 (2002), 182–197.
[33]	M. Tompa, N. Li and T. L. Bailey, et al., Assessing computational tools for the discovery of transcription factor binding sites, Nat. Biotechnol., 23 (2005), 137.

Reader Comments

Your name:*

Email:*
© 2019 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Mathematical Biosciences and Engineering

4.4

Metrics

Article views(5783) PDF downloads(562) Cited by(9)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

Figures and Tables

Figures(1) / Tables(6)

Mathematical Biosciences and Engineering

A multi-objective imperialist competitive algorithm (MOICA) for finding motifs in DNA sequences

Related Papers:

Abstract

References

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Catalog

Mathematical Biosciences and Engineering

A multi-objective imperialist competitive algorithm (MOICA) for finding motifs in DNA sequences

Related Papers:

Abstract

References

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog