Publication | Open Access
Multi-dimensional Boltzmann Sampling of Languages
37
Citations
12
References
2010
Year
EngineeringComputational ComplexityMulti-dimensional Boltzmann SamplingCorpus LinguisticsWord EmbeddingsNatural Language ProcessingUniform StatisticsRandom GraphData ScienceComputational LinguisticsRandom MappingStochastic GeometryLanguage StudiesProbabilistic Graph TheoryMachine TranslationBoltzmann SamplersComputer ScienceProbability TheoryUniform Random GenerationEntropyRandomized AlgorithmLinguistics
We address the uniform random generation of words from a context-free language (over an alphabet of size $k$), while constraining every letter to a targeted frequency of occurrence. Our approach consists in a multidimensional extension of Boltzmann samplers. We show that, under mostly $\textit{strong-connectivity}$ hypotheses, our samplers return a word of size in $[(1- \epsilon)n, (1+ \epsilon)n]$ and exact frequency in $\mathcal{O}(n^{1+k/2})$ expected time. Moreover, if we accept tolerance intervals of width in $\Omega (\sqrt{n})$ for the number of occurrences of each letters, our samplers perform an approximate-size generation of words in expected $\mathcal{O}(n)$ time. We illustrate our approach on the generation of Tetris tessellations with uniform statistics in the different types of tetraminoes.
| Year | Citations | |
|---|---|---|
Page 1
Page 1