"Short-Dot": Computing Large Linear Transforms Distributedly Using Coded\n Short Dot Products

Abstract

Faced with saturation of Moore's law and increasing dimension of data, system\ndesigners have increasingly resorted to parallel and distributed computing.\nHowever, distributed computing is often bottle necked by a small fraction of\nslow processors called "stragglers" that reduce the speed of computation\nbecause the fusion node has to wait for all processors to finish. To combat the\neffect of stragglers, recent literature introduces redundancy in computations\nacross processors, e.g.,~using repetition-based strategies or erasure codes.\nThe fusion node can exploit this redundancy by completing the computation using\noutputs from only a subset of the processors, ignoring the stragglers. In this\npaper, we propose a novel technique -- that we call "Short-Dot" -- to introduce\nredundant computations in a coding theory inspired fashion, for computing\nlinear transforms of long vectors. Instead of computing long dot products as\nrequired in the original linear transform, we construct a larger number of\nredundant and short dot products that can be computed faster and more\nefficiently at individual processors. In reference to comparable schemes that\nintroduce redundancy to tackle stragglers, Short-Dot reduces the cost of\ncomputation, storage and communication since shorter portions are stored and\ncomputed at each processor, and also shorter portions of the input is\ncommunicated to each processor. We demonstrate through probabilistic analysis\nas well as experiments that Short-Dot offers significant speed-up compared to\nexisting techniques. We also derive trade-offs between the length of the\ndot-products and the resilience to stragglers (number of processors to wait\nfor), for any such strategy and compare it to that achieved by our strategy.\n