Kurdish Speech to Text Recognition System Based on Deep Convolutional-recurrent Neural Networks

Authors

  • Lana Sardar Hussein Department of Computer Science, College of Science, University Sulaimanyah, Sulaimanyah, Kurdistan Region, Iraq
  • Sozan Abdulla Mahmood Department of Computer Science, College of Science, University Sulaimanyah, Sulaimanyah, Kurdistan Region, Iraq

DOI:

https://doi.org/10.21928/uhdjst.v6n2y2022.pp117-125

Keywords:

Deep Learning, Gated Recurrent Units, Kurdish Speech Recognition, Convolutional Neural Network

Abstract

In recent years, deep learning has had enormous success in speech recognition and natural language processing. In other languages, recent progress in speech recognition has been quite promising, but the Kurdish language has not seen comparable development. There are extremely few research papers on Kurdish speech recognition. In this paper, investigated Gated Recurrent Units (GRUs) which is one of the popular RNN models to recognize individual Kurdish words, and propose a very simplified deep-learning architecture to get more efficient and high accuracy model. The proposed model consists of a combination of CNN and GRU layers. The Kurdish Sorani Speech KSS dataset was created for the speech recognition system, as its 18799 sound files for 500 formal Kurdish words. Finally, the model proposed was trained with collected data and yielded over %96 accuracy. The combination of CNN an RNN (GURs) for speech recognition achieved superior performance compared to the other feed-forward deep neural network models and other statistical methods.

References

E. Morris. “Automatic Speech Recognition for Low-Resource and Morphologically Complex Languages”. Thesis. Rochester Institute of Technology, 2021.

S. Ruan, J. O. Wobbrock, K. Liou, A. Ng and J. A. Landay. “Comparing speech and keyboard text entry for short messages in two languages on touchscreen phones”. Journal Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies Archive, vol. 1, no.4, pp. 1-23, 2017.

M. Assefi, M. Wittie and A, Knight. “Impact of network performance on cloud speech recognition”. In: Proceedings of the 24th International Conference, pp. 1, 2015.

M. Asseffi, G. Liu, M. P. Wittie and C. Izurieta. “An Experimental Evaluation of Apple Siri and Google Speech Recognition”. ISCA SEDE Montana State University, Bozeman, 2015.

A. Ganj and F. Shenava. “2-Persian continuous speech recognition software”. In: The First Workshop on Persian Language and Computer. The 9th Iranian Electrical Engineering Conference, Iran, 2004.

F. A. Ganj, S. A. Seyedsalehi, M. Bijankhan, H. Sameti, S. Zadegan and J. Shenava. “1-Persian continuous speech recognition system”. In: The 9th Iranian Electrical Engineering Conference, 2000.

A. Qader and H. Hassani. “Kurdish (Sorani) Speech to Text: Presenting an Experimental Dataset”. arXiv: 1911.13087v1, 2019.

R. Yaseen and H. Hassani. “Kurdish Optical Character Recognition”. UKH Journal of Science and Engineering, vol. 2, pp. 18-27, 2018.

R. D. Zarro and M. A. Anwer. “Recognition-based online Kurdish character recognition using hidden Markov model and harmony search Eng.” Engineering Science and Technology an International Journal, vol. 20, no. 2, pp. 783-794, 2017.

A. T. Tofiq and J. A. Hussain. “Kurdish Text Segmentation using projection-based approaches”. UHD Journal of Science and Technology, vol. 5, no. 1, pp. 56-65, 2021.

H. Veisi, H. Hosseini, M. Amini, W. Fathy and A. Mahmudi. “Jira: A Kurdish Speech Recognition System Designing and Building Speech Corpus and Pronunciation Lexicon”. ArXiv abs/2102. 07412, 2021.

A. Alkhateeb. “Wavelet LPC with neural network for spoken Arabic digits recognition system”. Jordan Journal of Applied Science, vol. 4, pp. 1248-1255, 2014.

N. Turab, K. Khatatneh and A. Odeh. “A novel arabic speech recognition method using neural networks and gaussian filtering”. IJEECS International Journal of Electrical, Electronics and Computer Systems, vol. 19, pp. 1-5, 2014.

S. Malekzadeh, M. H. Gholizadeh and S. N. Razavi. “Persian Phonemes Recognition Using PPNet”. arXiv preprint arXiv: 1812.08600, 2018.

H. Veisi and A. Haji Mani. “Persian speech recognition using long short-term memory”. In: The 21st National Conference of the Computer Society of Iran. University of Tehran, Iran, 2015.

A. Graves, A. R. Mohamed and G. Hinton. “Speech recognition with deep recurrent neural networks”. In: ICASSP Conference. Institute of Electrical and Electronics Engineers, Piscataway, 2013.

A. R. Mohamed, G. Dahl and G. Hinton. “Deep belief networks for phone recognition”. In: Nips Workshop on Deep Learning for Speech Recognition and Related Applications. . IJCA Proceedings on National Conference, USA, 2009.

H. P. Arun, J. Kunjumon, R. Sambhunath and A. S. Ansalem. “Malayalam speech to text conversion using deep learning”. IOSR Journal of Engineering (IOSRJEN), vol. 11, no. 7, pp. 24-30, 2021.

M. M. H. Nahid, B. Purkaystha and M. S. Islam. “Bengali speech recognition: A double layered LSTM-RNN approach”. In: Procceding 20th Institute of Communication Culture Information and Technology, pp. 1-6, 2017.

M. Ravanelli, P. H. Brakel, M. Omologo and Y. Bengio. “Light gated recurrent units for speech recognition”. IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 2, pp. 92-102, 2018.

N. Zerari, S. Abdelhamid, H. Bouzgou and C. Raymond. “Bidirectional deep architecture for Arabic speech recognition”. Open Computer Science, vol. 9, pp. 92-102, 2019.

R. Ahmed, S. Islam, A. K. M. Muzahidul Islam and S. Shatabda1. “An Ensemble 1D-CNN-LSTM-GRU Model with Data Augmentation for Speech Emotion Recognition”. arXiv: 2112.05666, 2021.

C. Huang, G. Chen, H. Yu, Y. Bao and L. Zhao. “Speech emotion recognition under white noise”. Archives of Acoustics, vol. 38. pp. 457-463, 2013

I. Kandel and M. Castelli. “The effect of batch size on the generalizability of the convolutional neural networks on a histopathology dataset”. ICT Express, vol. 6, no. 4, pp. 312-315, 2020.

K. Cho, B. V. Merrienboer, D. Bahdanau and Y. Bengio. “On the Properties of Neural Machine Translation: Encoder-Decoder Approaches”. arXiv: 1409.1259v2, 2014.

J. Chung, C. Gulcehre, K. Cho and Y. Bengio. “Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling”. arXiv: 1412.3555v1, 2014.

Published

2022-11-18

How to Cite

Hussein, L. S., & Mahmood, S. A. (2022). Kurdish Speech to Text Recognition System Based on Deep Convolutional-recurrent Neural Networks. UHD Journal of Science and Technology, 6(2), 117–125. https://doi.org/10.21928/uhdjst.v6n2y2022.pp117-125

Issue

Section

Articles