Publication | Closed Access
Low-resource keyword search strategies for tamil
38
Citations
28
References
2015
Year
Unknown Venue
EngineeringSpeech CorpusSpoken Language ProcessingCorpus LinguisticsText MiningSpeech RecognitionNatural Language ProcessingKeywordaware Language ModelingInformation RetrievalData ScienceAcoustic DiversityComputational LinguisticsLanguage StudiesMachine TranslationSearch TechnologyKeyword SearchSinga TeamKeyword ExtractionLanguage RecognitionSpeech ProcessingSearch TechniqueLinguistics
We propose strategies for a state-of-the-art keyword search (KWS) system developed by the SINGA team in the context of the 2014 NIST Open Keyword Search Evaluation (OpenKWS14) using conversational Tamil provided by the IARPA Babel program. To tackle low-resource challenges and the rich morphological nature of Tamil, we present highlights of our current KWS system, including: (1) Submodular optimization data selection to maximize acoustic diversity through Gaussian component indexed N-grams; (2) Keywordaware language modeling; (3) Subword modeling of morphemes and homophones.
| Year | Citations | |
|---|---|---|
Page 1
Page 1