Concepedia

Publication | Closed Access

Grouping Web page references into transactions for mining World Wide Web browsing patterns

380

Citations

6

References

2002

Year

TLDR

Web-based organizations generate large volumes of data, and mining meaningful relationships from unstructured server logs is challenging because, unlike traditional domains, web references lack naturally defined transactions. The study aims to develop a browsing behavior model that distinguishes navigation-related page references from content-related ones. Using this model, the authors design a transaction identification method, validate it against the maximal forward reference algorithm, and apply it to real-world logs with the WEBMINER system to extract association rules. The method consistently outperforms the maximal forward reference algorithm in identifying meaningful transactions for association rule mining.

Abstract

Web-based organizations often generate and collect large volumes of data in their daily operations. Analyzing such data involves the discovery of meaningful relationships from a large collection of primarily unstructured data, often stored in Web server access logs. While traditional domains for data mining, such as point of sale databases, have naturally defined transactions, there is no convenient method of clustering web references into transactions. This paper identifies a model of user browsing behavior that separates web page references into those made for navigation purposes and those for information content purposes. A transaction identification method based on the browsing model is defined and successfully tested against other methods, such as the maximal forward reference algorithm proposed in (Chen et al., 1996). Transactions identified by the proposed methods are used to discover association rules from real world data using the WEBMINER system.

References

YearCitations

Page 1