Behind the Paper

Introducing MA’AKS: A Parallel Dataset for Arabic Sentiment Style Transfer

Sentiment swap alters a text from positive to negative tone while keeping the original meaning. Mughaus, R., Abudalfa, S., Luqman, H., Abdu, F., AlAli, M., Al-Dowayan, N., & Abdelali, A. MA’AKS: manually-curated parallel dataset for Arabic text sentiment swap. Language Resources and Evaluation.2025

Shadi Abudalfa Aug 19, 2025

Natural Language Processing (NLP) has undergone remarkable progress in the last decade, and one of the most exciting developments has been the rise of style transfer. At its core, style transfer seeks to modify the manner in which a sentence is expressed while preserving its meaning. Among the many dimensions of style that can be altered—such as formality, authorship, or dialect—sentiment has emerged as one of the most studied. Sentiment style transfer enables us to take a sentence with a particular polarity (positive or negative) and rephrase it in the opposite sentiment, without losing the original content.

While much research in this area has been conducted for English, low-resource languages—especially Arabic—have remained underserved. Existing datasets are limited, often non-parallel, and lack the rigorous annotations required for effective training and evaluation of sentiment transfer systems. This has constrained progress, leaving a significant gap in how we can build and test robust models for Arabic sentiment transformation.

To bridge this gap, we are introducing MA’AKS: a novel parallel dataset specifically designed for Arabic sentiment style transfer. MA’AKS is the first of its kind to provide high-quality parallel pairs of sentences in Modern Standard Arabic (MSA) with opposite sentiments. Each sentence in the dataset has been carefully annotated to ensure accuracy, linguistic quality, and consistency. In addition to presenting the dataset, we also benchmarked state-of-the-art large language models (LLMs)—AceGPT, JAIS, and Llama-3—on various sentiment transfer tasks, evaluating their performance under multiple learning settings: zero-shot, few-shot, and fine-tuning.

By making MA’AKS publicly available, along with annotation guidelines and experimental code, our work aims to push forward Arabic NLP, provide researchers with a valuable resource, and encourage future innovations in sentiment style transfer for low-resource languages.

Why Sentiment Style Transfer Matters

Sentiment style transfer has both academic and practical significance. In the academic sense, it provides a testbed for exploring controlled text generation, disentanglement of content from style, and the ability of models to generalize linguistic properties. In practical terms, sentiment transfer powers a wide range of real-world applications:

Content Moderation & Reframing – Automated rewriting tools can rephrase toxic or overly negative content into more positive or neutral forms, improving the quality of online discussions.
Creative Writing & Storytelling – Authors and content creators can generate multiple versions of text that reflect different emotional tones.
Personalized Communication – Chatbots and customer service systems can adjust their tone based on user sentiment, ensuring more empathetic and tailored interactions.
Educational Tools – Sentiment manipulation can help language learners practice how tone and affect influence meaning.

Despite its potential, sentiment style transfer in Arabic has remained underexplored. This is not due to lack of interest, but rather due to the absence of suitable resources. Without parallel datasets, it becomes difficult to train supervised models or to fairly evaluate performance across different architectures.

The Gap in Arabic Sentiment Datasets

English NLP research benefits from a variety of datasets for sentiment style transfer, ranging from product reviews to Twitter sentiment corpora. Many of these datasets are non-parallel, meaning they contain positive and negative sentences independently, without paired counterparts. While such data is useful for classification, it is insufficient for style transfer, which requires aligned pairs to directly train or evaluate transformations.

In Arabic, the challenges are amplified:

Scarcity of parallel data – Most available sentiment corpora focus on classification (labeling sentences as positive/negative) but do not provide paired sentences with opposite polarities.
Dialectal complexity – Arabic has multiple dialects, and while some resources exist for specific varieties, there is little work in Modern Standard Arabic (MSA) with controlled sentiment pairing.
Annotation difficulties – Sentiment can be subtle and context-dependent, requiring careful annotation to avoid ambiguity.
Limited evaluation benchmarks – Without standardized datasets, comparing models becomes inconsistent and less meaningful.

These gaps motivated us to build MA’AKS, a dataset that addresses these shortcomings with parallel pairs in Arabic sentiment transfer.

What is MA’AKS?

MA’AKS (short for Modern Arabic Annotated for Sentiment style transfer) is a parallel corpus of 5,000 sentences in Modern Standard Arabic. The dataset covers both positive and negative sentiments, with each sentence carefully rewritten to express the opposite sentiment while preserving semantic content.

For example:

Positive → Negative:
- Original (positive): "الخدمة في المطعم ممتازة." (The service in the restaurant is excellent.)
- Transferred (negative): "الخدمة في المطعم سيئة." (The service in the restaurant is poor.)
Negative → Positive:
- Original (negative): "الهاتف بطيء للغاية." (The phone is extremely slow.)
- Transferred (positive): "الهاتف سريع وسلس." (The phone is fast and smooth.)

Key Features of MA’AKS:

Parallel Sentiment Pairs – Each sentence has a direct counterpart with opposite sentiment.
High-Quality Annotations – Sentences were annotated and validated by expert annotators, ensuring linguistic accuracy.
Balanced Sentiment Distribution – Equal representation of positive and negative examples for fair model training.
Modern Standard Arabic (MSA) – Focused on MSA to provide a standardized foundation, avoiding dialectal inconsistencies.
Support for Multiple Learning Paradigms – Usable for both supervised learning (parallel pairs) and unsupervised or semi-supervised approaches.

Experimental Benchmarking

To validate MA’AKS and demonstrate its usefulness, we benchmarked several state-of-the-art large language models (LLMs) on sentiment style transfer:

AceGPT – An Arabic-capable model fine-tuned for multiple NLP tasks.
JAIS – A bilingual Arabic-English LLM trained with large-scale Arabic data.
Llama-3 – The latest generation of LLaMA models, with multilingual capabilities.

We evaluated these models under three distinct learning settings:

Zero-shot – The model is directly prompted to perform sentiment transfer without additional training.
Few-shot – A small number of in-context examples are provided to guide the model.
Fine-tuning – The model is explicitly trained on MA’AKS for sentiment transfer.

This comprehensive evaluation setup allows researchers to compare model performance across different paradigms, reflecting real-world usage scenarios.

Results and Insights

The experiments provided several key insights:

Zero-shot performance was limited, showing that even powerful LLMs struggle with sentiment transfer in Arabic without targeted data.
Few-shot prompting improved results, highlighting the usefulness of in-context learning for sentiment transfer.
Fine-tuning on MA’AKS yielded the best performance across all metrics, demonstrating the dataset’s value in training robust sentiment models.

Interestingly, different models exhibited varying strengths: while JAIS performed relatively better in few-shot settings due to its Arabic-specific training, Llama-3 excelled after fine-tuning, showing strong generalization once exposed to MA’AKS. AceGPT showed balanced performance across all setups, but particularly benefited from supervised fine-tuning.

Contributions of Our Work

The introduction of MA’AKS makes several contributions to the field of NLP:

First Parallel Arabic Dataset for Sentiment Style Transfer – A much-needed resource that directly addresses the lack of parallel corpora in Arabic NLP.
Comprehensive Annotation Guidelines – By publishing our annotation process, we provide transparency and enable reproducibility for future datasets.
Benchmarking with LLMs – Offering baseline results across different models and learning settings to establish reference points for future research.
Open Access – MA’AKS, along with guidelines and experimental code, is publicly released to support the community and foster collaboration.

Broader Impact

The release of MA’AKS goes beyond academic contributions. By enabling effective sentiment style transfer in Arabic, this dataset can:

Support Arabic content moderation tools to improve online safety.
Empower cross-cultural applications, where sentiment-aware translation and communication are crucial.
Enhance customer engagement tools for Arabic-speaking users in business and government sectors.
Facilitate comparative linguistic studies, enriching our understanding of sentiment expression in Arabic versus other languages.

Moreover, MA’AKS sets a precedent for creating similar resources in other underrepresented languages, inspiring future initiatives in low-resource NLP.

Future Directions

While MA’AKS represents an important step forward, there remain several open challenges and directions for future research:

Expanding Dataset Size – Extending beyond 5k sentences to cover more domains and contexts.
Incorporating Dialects – Adding parallel sentiment pairs for dialectal Arabic (Egyptian, Levantine, Gulf, etc.) to reflect real-world usage.
Exploring Multilingual Transfer – Investigating how sentiment transfer models trained on MA’AKS interact with datasets in other languages.
Human-Centered Evaluation – Incorporating human judgments of fluency, sentiment accuracy, and content preservation in addition to automatic metrics.
Integrating with Multimodal Tasks – Extending sentiment style transfer to multimodal contexts (e.g., image captions with emotional framing).

Conclusion

Sentiment style transfer is an exciting frontier in NLP, blending linguistic creativity with computational precision. For Arabic, however, the lack of parallel datasets has been a significant barrier. MA’AKS directly addresses this gap, offering the first high-quality parallel corpus for Arabic sentiment style transfer, annotated with care and released openly for the community.

Through benchmarking on state-of-the-art LLMs—AceGPT, JAIS, and Llama-3—we demonstrate the utility of MA’AKS across zero-shot, few-shot, and fine-tuning scenarios. Our results show that while LLMs hold promise, targeted datasets like MA’AKS are essential to unlock their full potential in low-resource languages.

We hope that MA’AKS will serve as a catalyst for new research, enabling both theoretical explorations of sentiment manipulation and practical applications that make technology more inclusive for Arabic speakers. By releasing not just the dataset but also the annotation guidelines and code, we aim to empower the community to build upon this work and take Arabic NLP to the next level.