Publication | Closed Access
On Set Size Distribution Estimation and the Characterization of Large Networks via Sampling
28
Citations
9
References
2013
Year
EngineeringNetwork AnalysisComputational ComplexityScale-free NetworkSharp ThresholdRandom GraphData ScienceCombinatorial OptimizationProbabilistic Graph TheoryStatisticsSocial Network AnalysisKnowledge DiscoveryNon-overlapping SetsProbability TheoryComputer ScienceNetwork TheoryNetwork ScienceGraph TheoryLarge-scale NetworkBusinessRandomized AlgorithmLarge Networks
In this work we study the set size distribution estimation problem, where elements are randomly sampled from a collection of non-overlapping sets and we seek to recover the original set size distribution from the samples. This problem has applications to capacity planning and network theory. Examples of real-world applications include characterizing in-degree distributions in large graphs and uncovering TCP/IP flow size distributions on the Internet. We demonstrate that it is difficult to estimate the original set size distribution. The recoverability of original set size distributions presents a sharp threshold with respect to the fraction of elements that remain in the sets. If this fraction lies below the threshold, typically half of the elements in power-law and heavier-than-exponential-tailed distributions, then the original set size distribution is unrecoverable. We also discuss practical implications of our findings.
| Year | Citations | |
|---|---|---|
Page 1
Page 1