Package "TopKLists" for Rank-based Genomic Data Integration

Michael G. Schimek, Eva Budinsk√°, Karl Kugler, and Shili Lin


Bioinformatics, data integration, rank aggregation, top-k ranked list


In comparative genomics a central interest is to find a highly conforming subset of objects (e.g. expressed genes) resulting from two or more biological experiments addressing the same or a similar research question. The original experimental metric measurements can usually be transformed into rank data, a step required when the original measurements across the involved studies cannot be compared directly. We have developed the R package TopKLists, which provides essential tools for meta-analysis of genomic study findings as well as other comparative tasks. It consists of three modules: (1) TopKInference offers exploratory nonparametric infer ence for the estimation of the top-k list length of paired rankings. (2) TopKSpace provides several rank aggregation techniques which allow the combination of studies of different lengths (space) as well as missing rank information. (3) TopKGraphics comprises a collection of graphical tools for the exploration of data and for the visualization of aggregation results. In this paper, we give an overview of the methods implemented, provide basic program information and finally show how TopKLists can be applied to control gene identification in multiple microarray data.

