Concepedia

Publication | Closed Access

How good is random linear coding based distributed networked storage

200

Citations

10

References

2005

Year

TLDR

Storing large files across a network requires distributing data to many storage nodes with limited capacity. The goal is to enable a downloader to reconstruct the entire file by contacting as few storage nodes as possible. The authors compare three schemes—uncoded, traditional erasure coding, and random linear coding—where each node stores a coded fragment chosen independently of others. Random linear coding achieves near‑optimal reconstruction with minimal node contacts and no extra centralized storage, whereas erasure coding requires additional server space and uncoded storage performs poorly.

Abstract

We consider the problem of storing a large file or multiple large files in a distributed manner over a network. In the framework we consider, there are multiple storage locations, each of which only have very limited storage space for each file. Each storage location chooses a part (or a coded version of the parts) of the file without the knowledge of what is stored in the other locations. We want a file-downloader to connect to as few storage locations as possible and retrieve the entire file. We compare the performance of three strategies: uncoded storage, traditional erasure coding based storage, random linear coding based storage motivated by network coding. We demonstrate that, in principle, a traditional erasure coding based storage (eg: Reed-Solomon Codes) strategy can almost do as well as one can ask for with appropriate choice of parameters. However, the cost is a large amount of additional storage space required at the centralized server before distribution among multiple locations. The random linear coding based strategy performs as well without suffering from any such disadvantage. Further, with a probability close to one, the minimum number of storage location a downloader needs to connect to (for reconstructing the entire file), can be very close to the case where there is complete coordination between the storage locations and the downloader. We also argue that an uncoded strategy performs poorly.

References

YearCitations

Page 1