On Retrieval Performance of Malay Textual Documents

M.P. Hamzah and T.M.T. Sembok (Malaysia)


Information retrieval; Malay language; Vector space; Similarity measure; Retrieval Performance


This paper analyzes the effect of two factors affecting retrieval performance of Malay textual documents: similarity measures and conflation of words. Three similarity measures namely inner product for un-weighted query terms, inner product for weighted query terms and cosine of the angle between query and document vectors have been studied and tested on Malay test collection. This paper shows that cosine method outperforms other similarity measures significantly. To further enhance the performance, data has been conflated using Malay stemming algorithms. This conflated data together with cosine method as a basis for calculating similarity in vector space shows significant improvement in term of precision.

