Concepedia

Publication | Closed Access

Efficient Distributed Topic Modeling with Provable Guarantees

16

Citations

19

References

2014

Year

Abstract

Topic modeling for large-scale distributed web-collections requires distributed tech-niques that account for both computational and communication costs. We consider topic modeling under the separability assumption and develop novel computationally efficient methods that provably achieve the statisti-cal performance of the state-of-the-art cen-tralized approaches while requiring insignifi-cant communication between the distributed document collections. We achieve trade-offs between communication and computa-tion without actually transmitting the doc-uments. Our scheme is based on exploiting the geometry of normalized word-word co-occurrence matrix and viewing each row of this matrix as a vector in a high-dimensional space. We relate the solid angle subtended by extreme points of the convex hull of these vectors to topic identities and construct dis-tributed schemes to identify topics. 1

References

YearCitations

Page 1