Publication | Closed Access
Fast and Scalable Distributed Boolean Tensor Factorization
19
Citations
29
References
2017
Year
Unknown Venue
EngineeringMachine LearningSuch Boolean TensorsBinary TensorsInformation RetrievalData ScienceData MiningMultilinear Subspace LearningParallel ComputingLow-rank ApproximationVery Large DatabaseKnowledge DiscoveryComputer ScienceData-intensive ComputingQuery OptimizationComputational ScienceMatrix FactorizationParallel ProgrammingMassive Data ProcessingBoolean TensorsBig Data
How can we analyze tensors that are composed of 0's and 1's? How can we efficiently analyze such Boolean tensors with millions or even billions of entries? Boolean tensors often represent relationship, membership, or occurrences of events such as subject-relation-object tuples in knowledge base data (e.g., 'Seoul'-'is the capital of'-'South Korea'). Boolean tensor factorization (BTF) is a useful tool for analyzing binary tensors to discover latent factors from them. Furthermore, BTF is known to produce more interpretable and sparser results than normal factorization methods. Although several BTF algorithms exist, they do not scale up for large-scale Boolean tensors. In this paper, we propose DBTF, a distributed algorithm for Boolean tensor factorization running on the Spark framework. By caching computation results, exploiting the characteristics of Boolean operations, and with careful partitioning, DBTF successfully tackles the high computational costs and minimizes the intermediate data. Experimental results show that DBTF decomposes up to 163-323 larger tensors than existing methods in 68-382 less time, and exhibits near-linear scalability in terms of tensor dimensionality, density, rank, and machines.
| Year | Citations | |
|---|---|---|
Page 1
Page 1