A new vision-based method for extracting academic information from conference web pages

Peng Wang, Mingqi Zhou, Yue You, Xiang Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

This paper proposes a new vision-based method for extracting academic information from conference Web pages. The main contributions include: (1) An new vision-based page segmentation algorithm is proposed to improve the result of classical VIPS algorithm. This algorithm can divide pages into text blocks. (2) All text blocks are classified as 10 categories according to vision features, keyword features and text content features. The initial classification results have 75% precision and 67% recall. (3) The context information of text blocks are employed to repair and refine initial classification results, which are improved to 96% precision and 98% recall. Finally, academic information is extracted from classified text blocks. Our experimental results on real-world datasets show that the proposed method is effective and efficient for extracting academic information from conference Web pages.

Original languageEnglish
Title of host publicationProceedings - 2012 IEEE 24th International Conference on Tools with Artificial Intelligence, ICTAI 2012
Pages976-981
Number of pages6
DOIs
StatePublished - 2012
Externally publishedYes
Event2012 IEEE 24th International Conference on Tools with Artificial Intelligence, ICTAI 2012 - Athens, Greece
Duration: 7 Nov 20129 Nov 2012

Publication series

NameProceedings - International Conference on Tools with Artificial Intelligence, ICTAI
Volume1
ISSN (Print)1082-3409

Conference

Conference2012 IEEE 24th International Conference on Tools with Artificial Intelligence, ICTAI 2012
Country/TerritoryGreece
CityAthens
Period7/11/129/11/12

Keywords

  • Web information extraction
  • Web page segmentation
  • bayesian network classifier

Fingerprint

Dive into the research topics of 'A new vision-based method for extracting academic information from conference web pages'. Together they form a unique fingerprint.

Cite this