Citation: Fotios Kounelis, Christos Makris. Space Efficient Data Structures for N-gram Retrieval[J]. AIMS Medical Science, 2017, 4(4): 426-440. doi: 10.3934/medsci.2017.4.426
[1] | Blandford D, Blelloch G (2002) Index compression through document reordering. In: Storer JA and Cohn M, Eds., Proc. 2002 IEEE Data Compression Conference, April, IEEE Computer Society Press, Los Alamitos, CA. pp. 342-351. |
[2] | Scholer F, Williams HE, Yiannis J, et al. (2002) Compression of inverted indexes for fast query evaluation. In: Beaulieu M, Baeza-Yates R, Myaeng SH and Jarvelin K, Eds., Proc. 25th Annual International ACMSIGIR Conference on Research and Development in Information Retrieval, August, Tampere, Finland, ACM Press, New York, pp. 222-229. |
[3] | Brandon MC, Wallace DC, Baldi P (2009) Data structures and compression algorithms for genomic sequence data. Bioinformatics 25: 1731-1738. doi: 10.1093/bioinformatics/btp319 |
[4] | Gonzalo N, Veli M (2007) Compressed full-text indexes. ACM Computing Surveys 39: 2. doi: 10.1145/1216370.1216372 |
[5] | Ricardo AB, Berthier AR (2011) Modern Information Retrieval-the concepts and technology behind search, Second edition. Pearson Education Ltd., Harlow, England ISBN 978-0-321-41691-9. |
[6] | Ian HW, Alistair M, Timothy CB (1999) Managing Gigabytes: Compressing and Indexing Documents and Images, Second Edition. Morgan Kaufmann ISBN 1-55860-570-3. |
[7] | Weiner P (1973) Linear Pattern Matching Algorithms In 14th Annual IEEE Symposium on Switching and Automata Theory. |
[8] | Ukkonen E (1995) On-Line Construction of Suffix Trees. Algorithmica: 249-260. |
[9] | Iliopoulos C, Makris C, Panagis Y, et al. (2006) The Weighted Suffix Tree: An Efficient Data Structure for Handling Molecular Weighted Sequences and its Applications. Fundament Informa 71: 259-277. |
[10] | Diamanti K, Kanavos A, Makris C, et al. (2014) Handling Weighted Sequences Employing Inverted Files and Suffix Trees. In Web Information Systems and Technologies, pp. 231-238, extended version, private communication with the authors. |
[11] | Kim MS, Whang KY, Lee JG, et al. (2005) N-gram/2l: A space and time efficient two level n-gram inverted index structure. Int Conference Very Large Databases: 325-336. |
[12] | Christodoulakis M, Iliopoulos CS, Mouchard L, et al. (2006) Computation of repetitions and regularities of biologically weighted sequences. J Comput Biol 13: 1214-1231. doi: 10.1089/cmb.2006.13.1214 |
[13] | Zhu H, Kollios G, Athitsos V (2012) A Generic Framework for Efficient and Effective Subsequence Retrieval. Proceed Very Large Data Bases: 1579-1590. |
[14] | Amir A, Iliopoulos CS, Kapah O, et al. (2006). Approximate matching in weighted sequences. Combinator Pattern Match: 365376 |
[15] | Alatabbi A, Crochemore M, Iliopoulos CS, et al. (2012) Overlapping repetitions in weighted sequence. Int Informat Technol Conference: 435-440. |
[16] | Zhang H, Guo Q, Iliopoulos CS (2010) An algorithmic framework for motif discovery problems in weighted sequences. Int Conference Algorithms Complex: 335-346. |
[17] | Zhang H, Guo Q, Iliopoulos CS (2010) Varieties of regularities in weighted sequences. Algorithmic Aspect Informa Manage: 271-280. |
[18] | Li G, Deng D, Feng J (2011) Faerie: Efficient Filtering Algorithm for Approximate Dictionary-based Entity Extraction. Special Interest Group Manage Data: 529-540. |
[19] | Mouza C, Litwin W, Rigaux P, et al. (2009) AS-Index: A Structure for String Search Using n-grams and Algebraic Signatures. Confer Informat Knowledge: 295-304. |
[20] | Marsan L, Sagot MF (2000) Extracting structured motifs using a suffix tree-algorithms and application to promoter consensus identification. Int Conference Res Comput Mol Biol: 210-219. |
[21] | Sun Z, Yang J, Deogun JS (2004) Misae: A new approach for regulatory motif extraction. Computa System Bioinforma Conference: 173-181. |
[22] | Holub J, Smyth WF (2003) Algorithms on indeterminate strings. In Australasian Workshop on Combinatorial Algorithms. |
[23] | Holub J, Smyth WF, Wang S (2008) Fast pattern matching on indeterminate strings. J Discrete Algorithms 6: 37-50. doi: 10.1016/j.jda.2006.10.003 |
[24] | Manning CD, Raghavan P, Schutze H (2008) Introduction to Information Retrieval. Cambridge University Press. |
[25] | Kim MS, Whang KY, Lee JG (2007) N-gram/2l-approximation: a two-level n-gram inverted index structure for approximate string matching. Compu System Sci Engineer 22: 6. |