Publication | Open Access
Using the Web to Obtain Frequencies for Unseen Bigrams
356
Citations
46
References
2003
Year
Corpus FrequenciesEngineeringIntelligent Information RetrievalSpectrum EstimationSemantic WebSemanticsCorpus LinguisticsText MiningNatural Language ProcessingInformation RetrievalData ScienceUnseen BigramsComputational LinguisticsSearch EngineLanguage StudiesTerminology ExtractionDistributional SemanticsSignal ProcessingWeb FrequenciesKeyword ExtractionSpectral AnalysisLanguage CorpusSpectral SearchingLinguisticsBig Data
This article shows that the Web can be employed to obtain frequencies for bigrams that are unseen in a given corpus. We describe a method for retrieving counts for adjective-noun, noun-noun, and verb-object bigrams from the Web by querying a search engine. We evaluate this method by demonstrating: (a) a high correlation between Web frequencies and corpus frequencies; (b) a reliable correlation between Web frequencies and plausibility judgments; (c) a reliable correlation between Web frequencies and frequencies recreated using class-based smoothing; (d) a good performance of Web frequencies in a pseudo disambiguation task.
| Year | Citations | |
|---|---|---|
Page 1
Page 1