Creating New Sentences to Summarize Documents

B. Choi and X. Huang (USA)


Document summarization, knowledge discovery, natural language processing, data mining, knowledge base, Web mining.


This paper describes the first summarization system that is able to create new sentences to summarize documents. Creating new sentences to summarize documents is a challenging research and no prior research is able to do so. Most prior researches are extraction based that analyze writing styles and document structures to find keywords or key sentences from documents and use those words or sentences as summaries. In this paper, we propose a new method to generate new simple sentences based on the main concepts contained in the documents. Our system starts with creating simple sentences that consists of subject, predicate, and object. It first simplifies each sentence of a document to the format of subject, predicate, and object, when possible. Then, it clusters the sentences into compatible classes that have similar concepts. It then creates a sentence for each of some of the largest compatible classes. Those created sentences serve as the summary of the document. The assumption used here is that the central ideas of a document are those with many supporting concepts. However, this approach does not yet capture the temporal and causal relations between sentences. The system has been implemented and tested. Test results show that our approach is viable for future research and applicable for knowledge discovery and sementic Web.

Important Links:

Go Back