Feature Selection and Conversion Methods in KDD Cup 99 Dataset: A Comparison of Performance

V. Bolón-Canedo; N. Sánchez-Maroño; A. Alonso-Betanzos; E. Hernández-Pereira

doi:10.2316/P.2010.674-059

Feature Selection and Conversion Methods in KDD Cup 99 Dataset: A Comparison of Performance

V. Bolón-Canedo, N. Sánchez-Maroño, A. Alonso-Betanzos, and E. Hernández-Pereira (Spain)

Keywords

Classification, Feature selection, Conversion methods, KDD Cup 99

Abstract

In this work, the KDD Cup 99 dataset, a benchmark dataset in the intrusion detection ﬁeld, is used to perform a comparative study that involves Feature Selection (FS) and symbolic-numeric conversion methods, as well as classiﬁers. FS may enhance the generalization capabilities of the classiﬁers, while discarding the existing irrelevant features in the KDD Cup 99 dataset. Among the different FS methods, the large number of samples of the KDD Cup determines the election of ﬁlters as the most adequate alternative. KDD Cup size also forces selecting classiﬁers that can handle it, in this case: C4.5, naive Bayes, One-Layer Feed forward Neural Network, Proximal Support Vector Machine and Multilayer Feedforward Neural Network. As some of these methods are not applicable over symbolic features, four different symbolic-numeric techniques will be employed to convert them. Then, the results of a broad study that includes two ﬁlters, four conversion methods and ﬁve classiﬁers, in addition to other techniques such as clustering or discretization, are shown. Results achieved over come the KDD contest winner results, while using only 15% of the original features, with the added advantages of simplicity, and time and memory reduction.

Important Links:

DOI: 10.2316/P.2010.674-059
From Proceeding (674) Artificial Intelligence and Applications - 2010

Go Back