Concepedia

Publication | Closed Access

“Catch me if you can”: visual Analysis of Coherence Defects in Web Archiving

21

Citations

16

References

2009

Year

Abstract

The World Wide Web is a continuously evolving network of contents (e.g. Web pages, images, sound files, etc.) and an interconnecting link structure. Hence, an archivist may never be sure if the contents collected so far are still consistent with those contents she needs to retrieve next. Therefore, questions arise about detecting, measuring them and – finally – understanding coherence defects. To this end, visualization strategies are being presented that might be applied on different level of granularities: working with (in the ideal case) properly set last-modified timestamps, based on metadata extracted from the crawler in accelerated crawl-revisit pairs, or from the Internet Archive’s WARC files. In order to help the archivist in understanding the nature of these defects, this paper investigates means for visualizing change behavior and archive coherence.

References

YearCitations

Page 1