Extracting academic information from conference Web pages

Peng Wang, Yue You, Baowen Xu, Jianyu Zhao

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

Conference Web pages are the main platforms to share the conference information and organize conference events. To discover the academic knowledge from such Web pages for building academic ontologies or social networks, it is necessary to extract academic information from conference Web pages. This paper proposes an approach to extract academic information from conference Web pages. Firstly, Web pages are segmented into text blocks by analyzing the visual feature and DOM structure. Then Bayes Network is used to classify these text blocks into predefined categories, and the quality of initial classification results are improved after post-processing. Finally, the academic information is extracted from the classified text blocks. Our experimental results on the real world datasets show that the proposed method is highly effective and efficient for extracting academic information from conference Web pages, and it has average 90% precision and 89% recall.

Original languageEnglish
Title of host publicationProceedings - 2011 23rd IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2011
Pages952-959
Number of pages8
DOIs
StatePublished - 2011
Externally publishedYes
Event23rd IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2011 - Boca Raton, FL, United States
Duration: 7 Nov 20119 Nov 2011

Publication series

NameProceedings - International Conference on Tools with Artificial Intelligence, ICTAI
ISSN (Print)1082-3409

Conference

Conference23rd IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2011
Country/TerritoryUnited States
CityBoca Raton, FL
Period7/11/119/11/11

Keywords

  • Bayes network
  • DOM structure
  • Visual feature
  • Web information extraction

Fingerprint

Dive into the research topics of 'Extracting academic information from conference Web pages'. Together they form a unique fingerprint.

Cite this