Improving Disease Detection with Machine Learning Strategies: Methods to Minimize False Negatives through Cost Sensitive Training

W.H. Horsthemke and K. Dahbur (USA)


Machine Learning, Neural Networks, Decision Trees, Classification.


Classification by machine learning can add valuable information to the medical diagnosis of disease. Modeled as a two-class problem, cancer diagnosis typically imparts more importance to detecting malignancy (presence of disease) than ruling out disease for the benign (absence of disease) class. This inequality of classes presents a problem for most machine learners that attempt to perform equally well on all classes. Their internal performance measures assign equal costs to each misclassification regardless of class and attempt to minimize the overall sum of misclassification costs. This paper investigates and evaluates two cost-sensitive training strategies, which can be used in machine learning algorithms, to assess and report on their effectiveness at improving the classification performance (improved malignancy detection) of a cancer diagnosis system. Proportional training was used as a strategy to vary the proportion of the malignant class in the training dataset and observe an attendant performance improvement for malignancy. From a bias-against the malignancy class (10% malignant in the training dataset) to a neutral bias (50% malignant and 50% benign), the sensitivity (rate of true malignancy detection) increased for the machine learning algorithms studied: feed-forward neural networks, vector quantization networks and decision trees. The penalty enforcement strategy was used to apply a penalty to the cost of misclassification in decision trees. An increased penalty weight was applied in the case of false negative misclassifications in an effort to signify their importance versus the false positive misclassifications. The strategy showed mixed results as some penalties improved the detection of malignancy while higher penalties often performed worse than no added penalties. In general, the performance improvement in malignancy class detection came at the expense of a partial reduction in the detection of the benign class. However, the malignancy detection improvement exceeded the rate of reduction for the benign detection, thus showing that the proportional training strategy can improve the ability of machine learning algorithms at classifying cancer patients.

Important Links:

Go Back