Dong Ren, Chang Zhang, Shun Ren, Zhong Zhang, Ji-hua Wang, and An-xiang Lu


NIR, spectra variable selection, ECARS, West Lake Longjing tea, identification


In this paper, the near-infrared spectroscopy is used to obtain the near-infrared spectra data of tea for the detection of West Lake Longjing tea and the general Longjing tea. Noise and other redundant information contained in the full spectrum will have a negative impact on the accuracy of the models during the data processing. Using the characteristic wavelength variables to build the models is more effective than the full spectrum. The competitive adaptive reweighted sampling (CARS) is one of the most common and effective methods for the characteristic wavelength variables selection. However, the regression coefficients of variables will change with the selected samples of the model varying randomly in CARS method. Therefore, the absolute value of the regression coefficients is not always able to fully reflect the importance of the variables. This paper introduces the variable effectiveness and proposes a wavelength selection approach called effectiveness competitive adaptive reweighted sampling (ECARS) to make up for this shortfall. This study is mainly to classify the 110 samples of West Lake Longjing tea and the general Longjing tea. The training set consists of 72 samples and the prediction set contains 38 samples. After the preprocessing of the second derivative, CARS, uninformative variable elimination, backward interval partial least squares, and ECARS algorithm proposed in this paper are used for the variables selection. Then the variable subset and the full spectrum are, respectively, used to build support vector machine (SVM) model and linear discriminant analysis model for the identification of West Lake Longjing tea and the general Longjing tea. The experiment results show that: (1) the accuracy of models that are processed by the variables selection methods is higher than those of the full spectrum models and all the other models; (2) the accuracy of the ECARS-SVM model is highest, and the accuracies of the training set and prediction set are 100% and 98.4%, respectively; (3) the ECARS algorithm proposed in this paper can efficiently reduce the number of variables, simplify the models, and improve the accuracy and stability of the models.

Important Links:

Go Back