Fit Data Selection based on Project Features for Software Effort Estimation Models

K. Toda, A. Monden, and K.-i. Matsumoto (Japan)


Effort estimation, Multivariate regression, Fit data selection.


To construct a better multivariate regression model for software effort estimation, this paper proposes a method to automatically select projects as fit data (a dataset for model construction) from a given project data set based on an estimation target's features. As a result of an experimental evaluation using the ISBSG data set, which is one of the most commonly used project data sets for effort estimation studies, the proposed method showed better estimation performance than the conventional method (of constructing a model using all project data). The median of MRE (Magnitude of Relative Error) was improved from 0.552 to 0.383, and also the median of MER (Magnitude of Error Relative) was improved from 0.457 to 0.381. While regression models were often constructed using all available project data, this paper showed the necessity of fit data selection, and showed that the proposed method is one of the effective and systematic means of doing the selection.

Important Links:

Go Back