Publication | Open Access
An Efficient Language Model Using Double-Array Structures
11
Citations
15
References
2013
Year
Unknown Venue
EngineeringMultilingual PretrainingLarge Language ModelCorpus LinguisticsText MiningSpeech RecognitionNatural Language ProcessingLarge Language ModelsWord EmbeddingsSyntaxInformation RetrievalData ScienceComputational LinguisticsLanguage EngineeringLanguage StudiesNgram Language ModelsLanguage ModelsMachine TranslationLanguage TechnologyNgram ModelsComputer ScienceLinguisticsPo Tagging
Ngram language models tend to increase in size with inflating the corpus size, and consume considerable resources. In this paper, we propose an efficient method for implementing ngram models based on doublearray structures. First, we propose a method for representing backwards suffix trees using double-array structures and demonstrate its efficiency. Next, we propose two optimization methods for improving the efficiency of data representation in the double-array structures. Embedding probabilities into unused spaces in double-array structures reduces the model size. Moreover, tuning the word IDs in the language model makes the model smaller and faster. We also show that our method can be used for building large language models using the division method. Lastly, we show that our method outperforms methods based on recent related works from the viewpoints of model size and query speed when both optimization methods are used.
| Year | Citations | |
|---|---|---|
Page 1
Page 1