Concepedia

TLDR

Web‑usage mining is increasingly important for personalized services, adaptive sites, and customer profiling, yet its reliability hinges on accurate session reconstruction, as errors and incomplete tracing can produce invalid patterns and misleading conclusions. This study evaluates the performance of session‑reconstruction heuristics from server logs and introduces performance measures sensitive to reconstruction errors for various KDD applications. The authors partition user activities by user and visit, using cookies when available, and conduct two experiments—one targeting a specific KDD application and another comparing heuristics across different performance measures. Applied to a frame‑based web site, the experiments revealed that heuristics are highly sensitive to site structure and traffic, and that no single heuristic dominates, but the proposed measures guide analysts in selecting the most suitable heuristic for a given application.

Abstract

Web-usage mining has become the subject of intensive research, as its potential for personalized services, adaptive Web sites and customer profiling is recognized. However, the reliability of Web-usage mining results depends heavily on the proper preparation of the input datasets. In particular, errors in the reconstruction of sessions and incomplete tracing of users’ activities in a site can easily result in invalid patterns and wrong conclusions. In this study, we evaluate the performance of heuristics employed to reconstruct sessions from the server log data. Such heuristics are called to partition activities first by user and then by visit of the user in the site, where user identification mechanisms, such as cookies, may or may not be available. We propose a set of performance measures that are sensitive to two types of reconstruction errors and appropriate for different applications in knowledge discovery (KDD) applications. We have tested our framework on the Web server data of a frame-based Web site. The first experiment concerned a specific KDD application and has shown the sensitivity of the heuristics to particularities of the site's structure and traffic. The second experiment is not bound to a specific application but rather compares the performance of the heuristics for different measures and thus for different application types. Our results show that there is no single best heuristic, but our measures help the analyst in the selection of the heuristic best suited for the application at hand.

References

YearCitations

Page 1