Selection of Clusters Number and Features Subset During a Two-Levels Clustering Task

S. Guérif and Y. Bennani (France)


Clustering, feature selection, self-organizing maps, model selection


Simultaneous selection of the number of clusters and of a relevant subset of features is part of data mining chal lenges. A new approach is proposed to address this dif ficult issue. It takes benefits of both two-levels clustering approaches and wrapper features selection algorithms. On the one hands, the former enhances the robustness to out liers and to reduce the running time of the algorithm. On the other hands, wrapper features selection (FS) approaches are known to given better results than filter FS methods because the algorithm that uses the data is taken into ac count. First, a Self-Organizing Maps (SOM), trained using the original data sets, is clustered using k-means and the Davies-Bouldin index to determinate the best number of a clusters. Then, an individual pertinence measure guides the backward elimination procedure and the feature mutual pertinence is measure using a collective pertinence based on the quality of the clustering.

Important Links:

Go Back