Automatic Extraction of Gene-Disease Association from Bio-Literature using Labeled PPI Data

Hongtao Zhang, Minlie Huang, and Xiaoyan Zhu


Gene-disease association, Gene-disease association extraction, Corpus weighting


An enormous number of gene-disease associations (GDA) are buried in millions of research articles published over the years, and the number is growing. Extracting them automatically is a challenging bioinformatics task. Although previous works have shown that supervised learning methods are superior for this task, the performance still relies on manually labeled training data. In this paper, we propose a solution to learn from plenty of labeled protein-protein interaction (PPI) data, and utilize the learned knowledge to help the extraction of GDA. In particular, a support vector machine modified for corpus weighting (SVM-CW) was applied to weight labeled PPI data, in order to allow knowledge to be effectively transferred from the PPI domain data to the GDA domain. The experimental results show that our solution can make full use of labeled PPI data and improve the performance of GDA extraction.

Important Links:

Go Back