Definition modeling, the task of generating a definition for a given term, is a relatively new area of research applied in evaluating word embeddings. Automatic generation of dictionary quality definitions has many applications in natural language processing, such as sentiment analysis, machine translation, and word sense disambiguation. Additionally, definition modeling is also helpful for evaluating the quality of word embeddings. As more research is done in this field, the need for a summary of different applications, approaches, and obstacles grows apparent. This review provides an overview of the current research in definition modeling and a list of future directions and trends.
Citation: Noah Gardner, Hafiz Khan, Chih-Cheng Hung. Definition modeling: literature review and dataset analysis[J]. Applied Computing and Intelligence, 2022, 2(1): 83-98. doi: 10.3934/aci.2022005
Definition modeling, the task of generating a definition for a given term, is a relatively new area of research applied in evaluating word embeddings. Automatic generation of dictionary quality definitions has many applications in natural language processing, such as sentiment analysis, machine translation, and word sense disambiguation. Additionally, definition modeling is also helpful for evaluating the quality of word embeddings. As more research is done in this field, the need for a summary of different applications, approaches, and obstacles grows apparent. This review provides an overview of the current research in definition modeling and a list of future directions and trends.
[1] | S. Banerjee, A. Lavie, METEOR: An automatic metric for MT evaluation with improved correlation with human judgments, in Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, (2005), 65–72. |
[2] | M. Bevilacqua, M. Maru, R. Navigli, Generationary or "How We Went beyond Word Sense Inventories and Learned to Gloss", in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), (2020), 7207–7221. https://doi.org/10.18653/v1/2020.emnlp-main.585 |
[3] | T. Bosc, P. Vincent, Auto-Encoding Dictionary Definitions into Consistent Word Embeddings, in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, (2018), 1522–1532. https://doi.org/10.18653/v1/D18-1181 |
[4] | T.-Y. Chang, Y.-N. Chen, What Does This Word Mean? Explaining Contextualized Embeddings with Natural Language Definition, in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), (2019), 6064–6070. https://doi.org/10.18653/v1/D19-1627 |
[5] | J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), (2019), 4171–4186. |
[6] | A. Gadetsky, I. Yakubovskiy, D. Vetrov, Conditional Generators of Words Definitions, in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), (2018), 266–271. https://doi.org/10.18653/v1/P18-2043 |
[7] | N. Gali, R. Mariescu-Istodor, D. Hostettler, P. Fränti, Framework for syntactic string similarity measures, Expert Syst. Appl., 129 (2019), 169–185. https://doi.org/10.1016/j.eswa.2019.03.048 doi: 10.1016/j.eswa.2019.03.048 |
[8] | F. Hill, K. Cho, A. Korhonen, Y. Bengio, Learning to Understand Phrases by Embedding the Dictionary, Transactions of the Association for Computational Linguistics, 4 (2016), 17–30. https://doi.org/10.1162/tacl_a_00080 doi: 10.1162/tacl_a_00080 |
[9] | J. Huang, H. Shao, K. C.-C. Chang, CDM: Combining Extraction and Generation for Definition Modeling, arXiv: 2111.07267 [cs]. |
[10] | S. Ishiwatari, H. Hayashi, N. Yoshinaga, G. Neubig, S. Sato, M. Toyoda, M. Kitsuregawa, Learning to Describe Unknown Phrases with Local and Global Contexts, in Proceedings of the 2019 Conference of the North, (2019), 3467–3476. https://doi.org/10.18653/v1/N19-1350 |
[11] | A. Kabiri, P. Cook, Evaluating a Multi-sense Definition Generation Model for Multiple Languages, in Text, Speech, and Dialogue (eds. P. Sojka, I. Kopeček, K. Pala and A. Horák), 12284 (2020), 153–161. https://doi.org/10.1007/978-3-030-58323-1_16 |
[12] | M. Kaneko, D. Bollegala, Dictionary-based Debiasing of Pre-trained Word Embeddings, in Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, (2021), 212–223. https://doi.org/10.18653/v1/2021.eacl-main.16 |
[13] | C. Kong, L. Yang, T. Zhang, Q. Fan, Z. Liu, Y. Chen, E. Yang, Toward Cross-Lingual Definition Generation for Language Learners, arXiv: 2010.05533 [cs]. |
[14] | M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, L. Zettlemoyer, BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension, in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, (2020), 7871–7880. https://doi.org/10.18653/v1/2020.acl-main.703 |
[15] | J. Li, Y. Bao, S. Huang, X. Dai, J. Chen, Explicit Semantic Decomposition for Definition Generation, in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, (2020), 708–717. https://doi.org/10.18653/v1/2020.acl-main.65 |
[16] | C.-Y. Lin, ROUGE: A package for automatic evaluation of summaries, in Text Summarization Branches Out, (2004), 74–81. |
[17] | T. Mickus, D. Paperno, M. Constant, Mark my word: A sequence-to-sequence approach to definition modeling, in Proceedings of the First NLPL Workshop on Deep Learning for Natural Language Processing, (2019), 1–11. |
[18] | T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient Estimation of Word Representations in Vector Space, arXiv: 1301.3781 [cs]. |
[19] | T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, J. Dean, Distributed representations of words and phrases and their compositionality, in NIPS, 2013. |
[20] | K. Ni, W. Y. Wang, Learning to explain non-standard English words and phrases, in Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers), (2017), 413–417. |
[21] | T. Noraset, C. Liang, L. Birnbaum, D. Downey, Definition Modeling: Learning to define word embeddings in natural language, arXiv: 1612.00394 [cs]. |
[22] | K. Papineni, S. Roukos, T. Ward, W.-J. Zhu, Bleu: a method for automatic evaluation of machine translation, in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, (2002), 311–318. https://doi.org/10.3115/1073083.1073135 |
[23] | M. Reid, E. Marrese-Taylor, Y. Matsuo, VCDM: Leveraging Variational Bi-encoding and Deep Contextualized Word Representations for Improved Definition Modeling, in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), (2020), 6331–6344. https://doi.org/10.18653/v1/2020.emnlp-main.513 |
[24] | K. Washio, S. Sekine, T. Kato, Bridging the Defined and the Defining: Exploiting Implicit Lexical Semantic Relations in Definition Modeling, in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), (2019), 3521–3527. https://doi.org/10.18653/v1/D19-1357 |
[25] | H. Zhang, Y. Du, J. Sun, Q. Li, Improving interpretability of word embeddings by generating definition and usage, Expert Syst. Appl., 160 (2020), 113633. https://doi.org/10.1016/j.eswa.2020.113633 doi: 10.1016/j.eswa.2020.113633 |
[26] | T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, Y. Artzi, BERTScore: Evaluating Text Generation with BERT, arXiv: 1904.09675 [cs]. |
[27] | H. Zheng, D. Dai, L. Li, T. Liu, Z. Sui, B. Chang, Y. Liu, Decompose, Fuse and Generate: A Formation-Informed Method for Chinese Definition Generation, in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (2021), 5524–5531. https://doi.org/10.18653/v1/2021.naacl-main.437 |
[28] | R. Zhu, T. Noraset, A. Liu, W. Jiang, D. Downey, Multi-sense Definition Modeling using Word Sense Decompositions, arXiv: 1909.09483 [cs]. |