Detecting Two Sorts of Correspondences between HTML Documents for Extracting Temporal Differences

M. Nakamura, K. Iwanuma, and H. Nabeshima (Japan)


HTML, temporal difference, alignment, detection, seman tics


WEB servers in the Internet provide a huge amount of in formation in the forms of HTML documents. HTML doc uments on WEB servers changes frequently and suddenly, thus a user is required to continuously watch these docu ments for detecting a content change. Such a watching task costs users tedious and time-consuming efforts. If an au tomatic detection of a change of a target HTML page is possible, then a user no longer need to monitor the WEB page. Moreover a user can get new updated information immediately after the page is modified. In this paper, we present a new mechanical detection method of temporal differences of succeeding two HTML documents, based on two sorts of correct correspondence computation between the HTML documents. The proposed method uses an ex tended alignment technology, and has the significant abil ity of understanding textual contexts as well as HTML tag structures. This method can treat, in a uniform way, two sorts of important correspondence relations between target HTML documents, i.e., the semantic correspondence and the location correspondence.

