Publication | Closed Access
Characterizing reference locality in the WWW
466
Citations
31
References
2002
Year
EngineeringSemantic WebWeb AnalyticsText MiningReference LocalityInformation RetrievalData ScienceData MiningReference DataManagementInternet ModelingLink AnalysisWeb CacheKnowledge DiscoveryWebometricsCachingSpatial LocalityComputer ScienceWeb PerformanceReference StreamTemporal Locality
Self‑similarity captures long‑range correlations in web request data, a property that previous researchers have struggled to model in synthetic reference strings. The study proposes models for both temporal and spatial locality of reference in web server request streams. The authors represent reference streams as stack‑distance traces, use these to model temporal locality via marginal distributions and spatial locality via self‑similarity, and outline methods for generating synthetic traces that preserve these properties. They find that document‑popularity models alone fail to capture locality, that temporal locality can be modeled by the stack‑distance distribution with improved cache performance, and that stack‑distance traces exhibit strong self‑similarity, confirming the effectiveness of their synthetic trace generation.
The authors propose models for both temporal and spatial locality of reference in streams of requests arriving at Web servers. They show that simple models based on document popularity alone are insufficient for capturing either temporal or spatial locality. Instead, they rely on an equivalent, but numerical, representation of a reference stream: a stack distance trace. They show that temporal locality can be characterized by the marginal distribution of the stack distance trace, and propose models for typical distributions and compare their cache performance to the traces. They also show that spatial locality in a reference stream can be characterized using the notion of self-similarity. Self-similarity describes long-range correlations in the data set, which is a property that previous researchers have found hard to incorporate into synthetic reference strings. They show that stack distance strings appear to be strongly self-similar, and provide measurements of the degree of self-similarity in the traces. Finally, they discuss methods for generating synthetic Web traces that exhibit the properties of temporal and spatial locality measured in the data.
| Year | Citations | |
|---|---|---|
Page 1
Page 1