Publication | Closed Access
How much of the web is archived?
98
Citations
12
References
2011
Year
Unknown Venue
Personal Digital ArchivingArchival ScienceEngineeringInformation RetrievalData ScienceArchivingWebometricsArchive Access AdditionsMemento ProjectResearch Data ArchivingWeb Time TravelData ManagementDigital Archive
The Memento Project’s HTTP archive‑access extensions have enabled new web‑archive user interfaces, prompting the question of how much of the Web is archived. The study aims to estimate the extent of Web archiving by sampling URIs from DMOZ, Delicious, Bitly, and search engine indexes and counting their archived copies in public archives. They sampled URIs from DMOZ, Delicious, Bitly, and search engine indexes and counted how many copies existed in various public web archives. They found that 35–90% of sampled URIs have at least one archived copy, 17–49% have two to five copies, 1–8% have six to ten copies, 8–63% have at least ten copies, and only 14.6–31.3% are archived more than once per month.
The Memento Project's archive access additions to HTTP have enabled development of new web archive access user interfaces. After experiencing this web time travel, the in- evitable question that comes to mind is "How much of the Web is archived?" This question is studied by approximating the Web via sampling URIs from DMOZ, Delicious, Bitly, and search engine indexes and measuring number of archive copies available in various public web archives. The results indicate that 35%-90% of URIs have at least one archived copy, 17%-49% have two to five copies, 1%-8% have six to ten copies, and 8%-63% at least ten copies. The number of URI copies varies as a function of time, but only 14.6-31.3% of URIs are archived more than once per month.
| Year | Citations | |
|---|---|---|
Page 1
Page 1