Random Forest Analysis on Diabetes Complication Data

Punnee Sittidech and Nongyao Nai-arun


Diabetes complications, Classification, Decision tree, Bagging, Random forest, Feature selection


This paper discusses how Random Forests, ensembles of weak decision trees, can be improved by excluding less important features from the model. Gain Ratio Feature Selection was used as the basis for tuning the algorithm parameters. Backwards elimination of the features to obtain the minimum subset with the highest accuracy was the key methodology of this experiment. The results of the proposed model were better in terms of accuracy and number of features used. The objective of this paper was to create a base-line, which will be useful for the classification on diabetes complications data. We recommend using the Random Forest with Feature Selection technique for other type of classification problems. Future work also includes an extension study of the different types of learning settings to improve the feature construction process.

Important Links:

Go Back