FindWDO: A k-Nearest Neighbors Approach for Detecting Web Document Outliers

A. Tanira (Palestine), A. Rafea, and H. Hassan (Egypt)


Web document outliers, Web mining, N-grams.


Web content outliers are Web documents with varying contents compared to other Web documents taken from the same category. Mining Web content outliers can be utilized to the identification of competitors, emerging business patterns in e-commerce, and cleaning corpus used in Web documents classification. This paper proposes a k-nearest neighbors approach (FindWDO) for detecting Web document outliers. Experimental results showed that FindWDO outperforms a similar algorithm in the same domain.

Important Links:

Go Back