Unsupervised Learning from Incomplete Categorical Data

F.-X. Jollois and M. Nadif (France)


missing data, clustering, mixture model, EM algorithm


Let a data set defined by a set of objects on a set of vari ables. Most current clustering methods of objects are not adapted when the data set is incomplete. And generally, these methods are used either by ignoring the objects with missing values or by substituting missing values by plausi ble values according to the nature of the data. In this paper, we focuss on unsupervised learning from incomplete cate gorical data and we study this problem under the mixture approach. First, we review a particular mixture model for complete categorical data and we describe the well-known EM algorithm. Then we extend it to the case where the data is incomplete. Simulation studies and numerical ex periments on real data give encouraging results.

Important Links:

Go Back