Arabic dialect identification based on motif discovery using GMM-UBM with different motif lengths

Abstract

Arabic Dialect Identification (DID) has attracted the attention of many researchers. In this paper, we introduce a new technique for extracting the characteristics of different Arabic dialects direct from the speech signal by discovering the repeated sequences (motifs) that characterize each dialect. For motif extraction, we adopted an extremely fast parameter-free Self-Join motif discovery algorithm called Scalable Time series Ordered-search Matrix Profile (STOMP). We extracted 12 Mel Frequency Cepstral Coefficients (MFCC) from each motif, and used it to train Gaussian Mixture Model-Universal Background Model (GMM-UBM) as a classifier. We applied our new approach on three different motif lengths 500ms, 1000ms, and 1500ms. The data set was downloaded from Qatar-Computing-Research-Institute domain. We carried out our experiments on the two most common Arabic dialects the Egyptian (EGY) and Levantine (LEV). Despite the simple implementation used, results were very competitive with the traditional, more sophisticated, techniques and open a new window in the LID/DID arena.

References

Page 1

	Year	Citations

Page 1