A Minimal Perfect Hashing Approach for Mining Association Rules from Very Large Databases

G.-J. Hwang, W.-F. Tsa, and J.C.R. Tseng (Taiwan)


data mining, association rules, database systems, very large databases


Data-mining techniques have attracted the attentions of researchers from various areas. 0ne of the most important issues on Data Mining is the mining of association rules for very large databases. It has been shown that the initial candidate set generation, especially for the large 2-itemsets, is the key issue to improve the performance of data mining. In this paper, a data-mining algorithm, MPH, based upon a minimal perfect hashing scheme is proposed. MPH directly generates large 2 itemsets without extra database scan and L1*L1 concatenation, and hence improves the performance significantly. As the hashing space of MPH is only related to the number of distinguishable data items in the database, MPH is especially suitable for handling very large databases with huge amount of transactions. Some experiments have been performed on the databases with the number of transactions ranging from 10,000 to 1,000,000. As the experimental result shows, MPH demonstrates significantly a better performance than previously proposed methods.

Important Links:

Go Back