Publication | Open Access
Counting occurrences for a finite set of words
13
Citations
25
References
2012
Year
EngineeringInclusion-exclusion PrincipleCorpus LinguisticsFinite SetText MiningApplied LinguisticsNatural Language ProcessingCombinatorics On WordComputational LinguisticsMaple PackageLanguage StudiesSymbolic Method (Combinatorics)Computational LexicologyKnowledge DiscoveryTerminology ExtractionEnumerative CombinatoricsProbability TheoryDistributional SemanticsLexical Complexity PredictionLinguistics
In this article, we provide the multivariate generating function counting texts according to their length and to the number of occurrences of words from a finite set. The application of the inclusion-exclusion principle to word counting due to Goulden and Jackson [1979, 1983] is used to derive the result. Unlike some other techniques which suppose that the set of words is reduced (i.e., where no two words are factor of one another), the finite set can be chosen arbitrarily. Noonan and Zeilberger [1999] already provided a Maple package treating the nonreduced case, without giving an expression of the generating function or a detailed proof. We provide a complete proof validating the use of the inclusion-exclusion principle. Some formulæ for expected values, variance, and covariance for number of occurrences when considering two arbitrary sets of finite words are given as an application of our methodology.
| Year | Citations | |
|---|---|---|
Page 1
Page 1