Using C4.5 as Variable Selection Criterion in Classification Tasks

J. Martínez and O. Fuentes (Mexico)


Machine learning, Supervised learning, C4.5 decision trees, Variable selection.


Variable selection is a difficult and important problem in machine learning. For classification tasks, it can lead to in creased accuracy or to reduced computational costs. In this paper we present an experimental study that shows how a very simple heuristic, namely using C4.5 for variable selec tion, can maintain classification accuracy in many bench mark problems while significantly reducing running times. In addition, we construct an ensemble that combines clas sifiers using the variables selected by C4.5 with classifiers that use the full variable set. Experimental results show that by using the selected variable set with C4.5, the clas sification accuracy is similar to that obtained by using the full variable set. This suggests that using C4.5 is a good approach for variable selection in classification tasks.

Important Links:

Go Back