A Transformer-Based Neural Network Machine Translation Model for the Kurdish Sorani Dialect


  • Soran Badawi Language Center, Charmo Center for Scientific Research & Consulting, Charmo University, Chamchamal, Sulaimani, KRG, Iraq




Machine Translation, Transformers, Dialect, Kurdish Language, Bilingual evaluation understudy


The transformer model is one of the most recently developed models for translating texts into another language. The model uses the principle of attention mechanism, surpassing previous models, such as sequence-to-sequence, in terms of performance. It performed well with highly resourced English, French, and German languages. Using the model architecture, we investigate training the modified version of the model in a low-resourced language such as the Kurdish language. This paper presents the first-ever transformer-based neural machine translation model for the Kurdish language by utilizing vocabulary dictionary units that share vocabulary across the dataset. For this purpose, we combine all the existing parallel corpora of Kurdish – English by building a large corpus and training it on the proposed transformer model. The outcome indicated that the suggested transformer model works well with Kurdish texts by scoring (0.45) on bilingual evaluation understudy (BLEU). According to the BLEU standard, the score indicates a high-quality translation.


S. Tripathi and J. K. Sarkhel. “Approaches to machine translation”. Annals of Library and Information Studies, vol. 57, pp. 383-393, 2010.

P. Koehn. “Statistical Machine Translation”. Cambridge University Press, Cambridge. 2009.

L. Bentivogli, A. Bisazza, M. Cettolo and M. Federicoa. “Neural versus phrase-based mt quality: An in-depth analysis on englishgerman and english-french”. Computer Speech and Language, vol. 49, pp. 52-70, 2019.

S. Ahmadi and M. Masoud. “Towards Machine Translation for the Kurdish Language”. arXiv preprint arXiv:2010.06041, 2020.

J. Tiedemann. “Parallel data, tools and interfaces in OPUS”. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12). European Language Resources Association (ELRA), Istanbul, Turkey. pp. 2214-2218, 2012.

M. Cettolo, C. Girardi and M. Federico. “Wit3: Web inventory of transcribed and translated talks”. In: Conference of European Association for Machine Translation. 2012.

P. Aliabadi, M. S. Ahmadi, S. Salavati and K. S. Esmaili. “Towards building kurdnet, the kurdish wordnet”. In: Proceedings of the Seventh Global Wordnet Conference. University of Tartu Press, Tartu, Estonia. 2014.

Z. Amini, M. Mohammadamini, H. Hosseini, M. Mansouri and D. Jaff. “Central Kurdish Machine Translation: First Large Scale Parallel Corpus and Experiments”. arXiv preprint arXiv:2106.09325, 2021.

L. Martinus and J. Z. Abbott. “A Focus on Neural Machine Translation for African Languages”. arXiv preprint arXiv:1906.05685, 2019.

M. Przystupa and M. Abdul-Mageed. “Neural machine translation of low-resource and similar languages with backtranslation”. In: Proceedings of the Fourth Conference on Machine Translation. vol. 3. Association for Computational Linguistics, Florence, Italy. 2019.

A. A. Tapo, B. Coulibaly, S. Diarra, C. Homan, J. Kreutzer, S. Luger, A. Nagashima, M. Zampieri and M. Leventhal. “Neural Machine Translation for Extremely Low-Resource African Languages: A Case Study on Bambara”. arXiv preprint arXiv:2011.05284, 2019.

G. A. Miller. “WordNet: An Electronic Lexical Database”. MIT Press, Massachusetts, United States. 1998.

J. L. Ba, J. R. Kiros and G. E. Hinton. “Layer Normalization”. arXiv preprint arXiv:1607.06450, 2016.

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser and I. Polosukhin. “Attention is all you need”. In: Conference on Advances in Neural Information Processing Systems. 2017.

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever and R. Salakhutdinov. “Dropout: A simple way to prevent neural networks from overfitting”. The Journal of Machine Learning Research, vol. 15, pp. 1929-1958, 2014.

J. Gehring, M. Auli, D. Grangier, D. Yarats and Y. N. Dauphin. “Convolutional sequence to sequence learning”. In: Proceedings of the 34th International Conference on Machine Learning (PMLR). 2017.

M. Shafiq and Z. Gu, “Deep Residual Learning for Image Recognition: A Survey,” Applied Sciences, vol. 12, no. 18, p. 8972, 2022.

L. N. Smith. “Cyclical learning rates for training neural networks”. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, Santa Rosa, CA, USA. 2017.

G. Hai and Y. Matras. “Kurdish linguistics: A brief overview”. STUFLanguage Typology and Universals, vol. 55, pp. 3-14, 2002.

M. R. Manzini, L. M. Savoia and L. Franco. “Ergative case, aspect and person splits: Two case studies”. Acta Linguistica Hungarica, vol. 52, pp. 297-351, 2015.