Research article

Transferring monolingual model to low-resource language: the case of Tigrinya


  • Received: 23 October 2024 Revised: 02 November 2024 Accepted: 08 November 2024 Published: 18 November 2024
  • In recent years, transformer models have achieved great success in natural language processing (NLP) tasks. Most of the current results are achieved by using monolingual transformer models, where the model is pre-trained using a single-language unlabelled text corpus. Then, the model is fine-tuned to the specific downstream task. However, the cost of pre-training a new transformer model is high for most languages. In this work, we propose a cost-effective transfer learning method to adopt a strong source language model, trained from a large monolingual corpus to a low-resource language. Thus, using the XLNet language model, we demonstrate competitive performance with mBERT and a pre-trained target language model on the cross-lingual sentiment (CLS) dataset and on a new sentiment analysis dataset for the low-resource language Tigrinya. With only 10k examples of the given Tigrinya sentiment analysis dataset, English XLNet achieved 78.88% F1-Score, outperforming BERT and mBERT by 10% and 7%, respectively. More interestingly, fine-tuning (English) XLNet model on the CLS dataset showed promising results compared to mBERT and even outperformed mBERT for one dataset of the Japanese language.

    Citation: Abrhalei Tela, Abraham Woubie, Ville Hautamäki. Transferring monolingual model to low-resource language: the case of Tigrinya[J]. Applied Computing and Intelligence, 2024, 4(2): 184-194. doi: 10.3934/aci.2024011

    Related Papers:

  • In recent years, transformer models have achieved great success in natural language processing (NLP) tasks. Most of the current results are achieved by using monolingual transformer models, where the model is pre-trained using a single-language unlabelled text corpus. Then, the model is fine-tuned to the specific downstream task. However, the cost of pre-training a new transformer model is high for most languages. In this work, we propose a cost-effective transfer learning method to adopt a strong source language model, trained from a large monolingual corpus to a low-resource language. Thus, using the XLNet language model, we demonstrate competitive performance with mBERT and a pre-trained target language model on the cross-lingual sentiment (CLS) dataset and on a new sentiment analysis dataset for the low-resource language Tigrinya. With only 10k examples of the given Tigrinya sentiment analysis dataset, English XLNet achieved 78.88% F1-Score, outperforming BERT and mBERT by 10% and 7%, respectively. More interestingly, fine-tuning (English) XLNet model on the CLS dataset showed promising results compared to mBERT and even outperformed mBERT for one dataset of the Japanese language.



    加载中


    [1] S. Bird, Ewan Klein, Edward Loper, Natural language processing with Python: analyzing text with the natural language toolkit, 1 Ed., Sebastopol: O'Reilly Media, Inc., 2009.
    [2] Y. Wu, M. Schuster, Z. Chen, Q. V. Le, M. Norouzi, W. Macherey, et al., Google's neural machine translation system: bridging the gap between human and machine translation, arXiv: 1609.08144. http://dx.doi.org/10.48550/arXiv.1609.08144
    [3] B. Liu, Sentiment analysis and opinion mining, Cham: Springer, 2012. http://dx.doi.org/10.1007/978-3-031-02145-9
    [4] P. Rajpurkar, J. Zhang, K. Lopyrev, P. Liang, SQuAD: 100,000+ questions for machine comprehension of text, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016, 2383–2392. http://dx.doi.org/10.18653/v1/D16-1264 doi: 10.18653/v1/D16-1264
    [5] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, et al., Attention is all you need, Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017, 6000–6010.
    [6] J. Devlin, M. Chang, K. Lee, K. Toutanova, BERT: pre-training of deep bidirectional transformers for language understanding, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 2019, 4171–4186. http://dx.doi.org/10.18653/v1/N19-1423 doi: 10.18653/v1/N19-1423
    [7] Z. Yang, Z. Dai, Y. Yang, J. G. Carbonell, R. Salakhutdinov, Q. V. Le, XLNet: generalized autoregressive pretraining for language understanding, Proceedings of the 33rd International Conference on Neural Information Processing Systems, 2019, 5753–5763.
    [8] S. Ruder, A. Søgaard, I. Vulić, Unsupervised cross-lingual representation learning, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts, 2019, 31–38. http://dx.doi.org/10.18653/v1/P19-4007 doi: 10.18653/v1/P19-4007
    [9] C. Wang, M. Li, A. J. Smola, Language models with transformers, arXiv: 1904.09408. http://dx.doi.org/10.48550/arXiv.1904.09408
    [10] G. Lample, A. Conneau, Cross-lingual language model pretraining, Proceedings of the 33rd International Conference on Neural Information Processing Systems, 2019, 7059–7069.
    [11] A. Conneau, K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, et al., Unsupervised cross-lingual representation learning at scale, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, 8440–8451. http://dx.doi.org/10.18653/v1/2020.acl-main.747 doi: 10.18653/v1/2020.acl-main.747
    [12] W. Vries, A. Cranenburgh, A. Bisazza. T. Caselli, G. Noord, M. Nissim, BERTje: a dutch BERT model, arXiv: 1912.09582. http://dx.doi.org/10.48550/arXiv.1912.09582
    [13] A. Virtanen, J. Kanerva, R. Ilo, J. Luoma. J. Luotolahti, T. Salakoski, et al., Multilingual is not enough: BERT for Finnish, arXiv: 1912.07076. http://dx.doi.org/10.48550/arXiv.1912.07076
    [14] K. K, Z. Wang, S. Mayhew, D. Roth, Cross-lingual ability of multilingual BERT: an empirical study, arXiv: 1912.07840. http://dx.doi.org/10.48550/arXiv.1912.07840
    [15] M. Artetxe, S. Ruder, D. Yogatama, On the cross-lingual transferability of monolingual representations, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, 4623–4637. http://dx.doi.org/10.18653/v1/2020.acl-main.421 doi: 10.18653/v1/2020.acl-main.421
    [16] Y. Tedla, K. Yamamoto, Morphological segmentation with LSTM neural networks for Tigrinya, IJNLC, 7 (2018), 29–44. http://dx.doi.org/10.5121/ijnlc.2018.7203 doi: 10.5121/ijnlc.2018.7203
    [17] R. Hetzron, The Semitic languages, New York: Routledge, 1997.
    [18] O. Osman, Y. Mikami, Stemming Tigrinya words for information retrieval, Proceedings of COLING 2012: Demonstration Papers, 2012, 345–352.
    [19] M. Tadesse, Trilingual sentiment analysis on social media, Master Thesis, Univeristy of Addis Ababa, 2018.
    [20] Y. K. Tedla, K. Yamamoto, A. Marasinghe, Tigrinya part-of-speech tagging with morphological patterns and the new Nagaoka Tigrinya corpus, International Journal of Computer Applications, 146 (2016), 33–41. http://dx.doi.org/10.5120/IJCA2016910943 doi: 10.5120/IJCA2016910943
    [21] A. Sahle, Sewasiw Tigrinya B'sefihu/a comprehensive Tigrinya grammar, Lawrenceville: Red Sea Press, Inc., 1998.
    [22] T. Kudo, J. Richardson, SentencePiece: a simple and language independent subword tokenizer and detokenizer for neural text processing, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 2018, 66–71. http://dx.doi.org/10.18653/v1/D18-2012 doi: 10.18653/v1/D18-2012
    [23] Z. Chi, L. Dong, F. Wei, X. Mao, H. Huang, Can monolingual pretrained models help cross-lingual classification? Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, 2020, 12–17.
    [24] T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representations in vector space, arXiv: 1301.3781. http://dx.doi.org/10.48550/arXiv.1301.3781
    [25] P. Prettenhofer, B. Stein, Cross-language text classification using structural correspondence learning, Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, 2010, 1118–1127.
    [26] A. Sugiyama, N. Yoshinaga, Data augmentation using back-translation for context-aware neural machine translation, Proceedings of the Fourth Workshop on Discourse in Machine Translation, 2019, 35–44. http://dx.doi.org/10.18653/v1/D19-6504 doi: 10.18653/v1/D19-6504
    [27] J. Wei, K. Zou, EDA: easy data augmentation techniques for boosting performance on text classification tasks, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019, 6382–6388. http://dx.doi.org/10.18653/v1/D19-1670 doi: 10.18653/v1/D19-1670
  • Reader Comments
  • © 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(437) PDF downloads(23) Cited by(0)

Article outline

Figures and Tables

Figures(4)  /  Tables(4)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog