Multilingual Language Models in Persian NLP Tasks: A Performance Comparison of Fine-Tuning Techniques | ||
Journal of AI and Data Mining | ||
دوره 13، شماره 1، فروردین 2025، صفحه 107-117 اصل مقاله (1.53 M) | ||
نوع مقاله: Original/Review Paper | ||
شناسه دیجیتال (DOI): 10.22044/jadm.2025.15167.2625 | ||
نویسندگان | ||
Ali Reza Ghasemi؛ Javad Salimi Sartakhti* | ||
Artificial Intelligence Group, Faculty of Electrical and Computer Engineering, University of Kashan, Kashan, Iran. | ||
چکیده | ||
This paper evaluates the performance of various fine-tuning methods in Persian natural language processing (NLP) tasks. In low-resource languages like Persian, which suffer from a lack of rich and sufficient data for training large models, it is crucial to select appropriate fine-tuning techniques that mitigate overfitting and prevent the model from learning weak or surface-level patterns. The main goal of this research is to compare the effectiveness of fine-tuning approaches such as Full-Finetune, LoRA, AdaLoRA, and DoRA on model learning and task performance. We apply these techniques to three different Persian NLP tasks: sentiment analysis, named entity recognition (NER), and span question answering (QA). For this purpose, we conduct experiments on three Transformer-based multilingual models with different architectures and parameter scales: BERT-base multilingual (~168M parameters) with Encoder only structure, mT5-small (~300M parameters) with Encoder-Decoder structure, and mGPT (~1.4B parameters) with Decoder only structure. Each of these models supports the Persian language but varies in structure and computational requirements, influencing the effectiveness of different fine-tuning approaches. Results indicate that fully fine-tuned BERT-base multilingual consistently outperforms other models across all tasks in basic metrics, particularly given the unique challenges of these embedding-based tasks. Additionally, lightweight fine-tuning methods like LoRA and DoRA offer very competitive performance while significantly reducing computational overhead and outperform other models in Performance-Efficiency Score introduced in the paper. This study contributes to a better understanding of fine-tuning methods, especially for Persian NLP, and offers practical guidance for applying Large Language Models (LLMs) to downstream tasks in low-resource languages. | ||
کلیدواژهها | ||
Fine-Tuning Techniques؛ PEFT؛ Low-Resource Languages؛ Multilingual Language Models؛ BERT. | ||
مراجع | ||
[1] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Nature, vol. 323, no. 6088, pp. 533–536, 1986.
[2] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[3] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems, 2017, pp. 5998–6008.
[4] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, 2019.
[5] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, “LoRA: Low-Rank Adaptation of Large Language Models,” arXiv preprint arXiv:2106.09685, 2021.
[6] T. Pires, E. Schlinger, and D. Garrette, “How Multilingual Is Multilingual BERT?,” arXiv preprint arXiv:1906.01502, 2019.
[7] A. Abaskohi, S. Baruni, M. Masoudi, N. Abbasi, M. Babalou, A. Edalat, S. Kamahi, S. Mahdizadeh Sani, N. Naghavian, D. Namazifard, P. Sadeghi, Y. Yaghoobzadeh, “Benchmarking Large Language Models for Persian: A Preliminary Study Focusing on ChatGPT,” arXiv preprint arXiv:2404.02403, 2024.
[8] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal Journal of Machine Learning Research, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei, “Language Models are Few-Shot Learners,” in Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901, 2020.
[9] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer,”, vol. 21, pp. 1–67, 2020.
[10] L. Xue, N. Constant, A. Roberts, M. Kale, R. Al-Rfou, A. Siddhant, A. Barua, and C. Raffel, “mT5: A massively multilingual pre-trained text-to-text transformer,” arXiv preprint arXiv:2010.11934, Oct. 22, 2020.
[11] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, “Language Models are Unsupervised Multitask Learners,” OpenAI, 2019.
[12] O. Shliazhko, A. Fenogenova, M. Tikhonova, V. Mikhailov, A. Kozlova, and T. Shavrina, “mGPT: Few-Shot Learners Go Multilingual,” arXiv preprint arXiv:2204.07580, 2022.
[13] M. Farahani, M. Gharachorloo, M. Farahani, and M. Manthouri, “ParsBERT: Transformer-based model for Persian language understanding,” arXiv preprint arXiv:2005.12515, 2020.
[14] M. Hamidzadeh, “Persian-NER-Dataset-500k,” Hugging Face, 2024. [Online]. Available: https://huggingface.co/datasets/mansoorhamidzadeh/Persian-NER-Dataset-500k. [Accessed: Feb. 9, 2025].
[15] S. Sabouri, “syntran-fa,” Hugging Face, [Online]. Available: https://huggingface.co/datasets/SLPL/syntran-fa. [Accessed: Feb. 9, 2025].
[16] S.-Y. Liu, C.-Y. Wang, H. Yin, P. Molchanov, Y.-C. F. Wang, K.-T. Cheng, and M.-H. Chen, “DoRA: Weight-Decomposed Low-Rank Adaptation,” arXiv preprint arXiv:2402.09353, 2024.
[17] Q. Zhang, M. Chen, A. Bukharin, N. Karampatziakis, P. He, Y. Cheng, W. Chen, and T. Zhao, “AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning,” arXiv preprint arXiv:2303.10512, 2023.
[18] Hugging Face, “Auto classes: AutoModelForSequenceClassification,” Hugging Face, [Online]. Available: https://huggingface.co/docs/transformers/model_doc/auto#transformers.AutoModelForSequenceClassification. [Accessed: Feb. 9, 2025].
[19] Hugging Face, “Auto classes: AutoModelForTokenClassification,” Hugging Face, [Online]. Available: https://huggingface.co/docs/transformers/model_doc/auto#transformers.AutoModelForTokenClassification. [Accessed: Feb. 9, 2025].
[20] Hugging Face, “Auto classes: AutoModelForQuestionAnswering,” Hugging Face, [Online]. Available: https://huggingface.co/docs/transformers/model_doc/auto#transformers.AutoModelForQuestionAnswering. [Accessed: Feb. 9, 2025].
[21] R. Shuttleworth, J. Andreas, A. Torralba, and P. Sharma, “LoRA vs Full Fine-tuning: An Illusion of Equivalence,” arXiv preprint arXiv:2410.21228, 2024. | ||
آمار تعداد مشاهده مقاله: 241 تعداد دریافت فایل اصل مقاله: 198 |