Skip to main content

Share your thoughts about the Research Communities in our survey.

News and Opinion

Arabic Text Formality Transfer

Formality transfer—the task of converting text between informal and formal registers while preserving its semantic meaning—has gained considerable momentum within the field of natural language processing (NLP).

Published in Computational Sciences and Arts & Humanities

Jul 23, 2025

Shadi Abudalfa

Research Scientist, SDAIA-KFUPM JRC for Artificial Intelligence

Follow

Like

Liked by Yijia Li

Large language models (LLMs) have achieved remarkable success in a wide array of natural language processing (NLP) tasks, including text style transfer and machine translation. One particularly important application is text formality transfer, where informal or dialectal language is converted into a formal register, typically Modern Standard Arabic (MSA) in Arabic-language contexts. While considerable research has been devoted to English and other high-resource languages, Arabic remains underexplored, primarily due to its rich morphological structure, dialectal variation, and the scarcity of annotated parallel corpora.

This video blog provides a detailed overview of our research on evaluating Arabic-based LLMs — including Jais, AceGPT, ArabianGPT, and LLaMA — for their ability to translate Arabic dialects (ADs) into MSA. The study is motivated by the fact that most prior evaluations of LLMs have focused either on English or on English–MSA translation, leaving a gap in understanding how well these models perform when dealing with intra-Arabic language variation.

To address this, we conducted a series of experiments using four publicly available datasets that include rich dialectal content: MADAR, MDC, PADIC, and BIBLE. These datasets encompass a variety of dialects across regions and domains, offering a robust testing ground for evaluating model performance. Our methodology included zero-shot, few-shot, and in-context fine-tuning learning paradigms — simulating practical usage scenarios from low-data setups to more guided translation tasks.

Key performance metrics were used to measure translation quality, including BLEU, COMET, ChrF1, and BERTScore. Our findings reveal that Jais and AceGPT consistently outperform other models, including the widely-used LLaMA, across all metrics and evaluation settings. This performance gap highlights the importance of pretraining on Arabic text, which both Jais and AceGPT benefit from. In contrast, LLaMA, which is predominantly trained on English data, struggles with capturing the nuanced structures of Arabic dialects.

These results not only emphasize the need for LLMs tailored to low-resource languages, but also highlight the linguistic and cultural richness of Arabic as a testbed for NLP research. By focusing on dialect-to-MSA translation — a task with direct implications for social media processing, customer service, digital archiving, and educational tools — this study contributes meaningful insights to both academic and applied research communities.

References & Further Reading

Abdu, F., Mughaus, R., Abudalfa, S., Ahmed, M., & Abdelali, A. (2025). An empirical evaluation of Arabic text formality transfer: A comparative study. Language Resources and Evaluation. Springer Nature.
Abudalfa, S., Abdu, F., & Alowaifeer, M. (2024). Arabic text formality modification: A review and future research directions. IEEE Access.
Kadaoui, K., Magdy, S. M., Waheed, A., Khondaker, M. T. I., El-Shangiti, A. O., Nagoudi, E. M. B., & Abdul-Mageed, M. (2023). Tarjamat: Evaluation of Bard and ChatGPT on machine translation of ten Arabic varieties. arXiv preprint arXiv:2308.03051.
Zhang, X., Rajabi, N., Duh, K., & Koehn, P. (2023). Machine translation with large language models: Prompting, few-shot learning, and fine-tuning with QLoRA. In Proceedings of the Eighth Conference on Machine Translation (pp. 468–481).
Derouich, W., Kchaou, S., & Boujelbane, R. (2023). ANLP-RG at NADI 2023 shared task: Machine translation of Arabic dialects—A comparative study of transformer models. In Proceedings of ArabicNLP 2023 (pp. 683–689).
Slim, A., & Melouah, A. (2024). Low-resource Arabic dialects transformer neural machine translation improvement through incremental transfer of shared linguistic features. Arabian Journal for Science and Engineering, 1–17.

Shadi Abudalfa

Research Scientist, SDAIA-KFUPM JRC for Artificial Intelligence

Follow

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Sign In Register

Follow the Topic

Applied Linguistics

Humanities and Social Sciences > Linguistics > Applied Linguistics

Natural Language Processing (NLP)

Mathematics and Computing > Computer Science > Artificial Intelligence > Natural Language Processing (NLP)

Artificial Intelligence

Mathematics and Computing > Computer Science > Artificial Intelligence

Humanities and Social Sciences > Linguistics > Languages > Oriental or Semitic Languages > Arabic

Computer and Information Systems Applications

Mathematics and Computing > Computer Science > Computer and Information Systems Applications

Recommended Content

Behind the Paper

Introducing MA’AKS: A Parallel Dataset for Arabic Sentiment Style Transfer

Introducing MA’AKS: A Parallel Dataset for Arabic Sentiment Style Transfer

Behind the Paper

Unlocking Arabic Formality Transfer with Large Language Models: A Deep Dive into Dialect-to-MSA Translation

Unlocking Arabic Formality Transfer with Large Language Models: A Deep Dive into Dialect-to-MSA Translation

Behind the Paper

Smart Object Localization in Smart Homes: Enhancing Elderly Care through RFID and Machine Learning

Smart Object Localization in Smart Homes: Enhancing Elderly Care through RFID and Machine Learning

Behind the Paper

Exploring Formality Transfer in Arabic Text: A Comprehensive Review and Future Outlook

Exploring Formality Transfer in Arabic Text: A Comprehensive Review and Future Outlook

Behind the Paper

Unlocking the Potential of Smart Toothbrushes in Ambient Intelligence

Unlocking the Potential of Smart Toothbrushes in Ambient Intelligence

Cookies

We use cookies to ensure the functionality of our website, to personalize content and advertising, to provide social media features, and to analyze our traffic. If you allow us to do so, we also inform our social media, advertising and analysis partners about your use of our website. You can decide for yourself which categories you want to deny or allow. Please note that based on your settings not all functionalities of the site are available.

Further information can be found in our privacy policy.