TY - GEN
T1 - Extracting academic information from conference Web pages
AU - Wang, Peng
AU - You, Yue
AU - Xu, Baowen
AU - Zhao, Jianyu
PY - 2011
Y1 - 2011
N2 - Conference Web pages are the main platforms to share the conference information and organize conference events. To discover the academic knowledge from such Web pages for building academic ontologies or social networks, it is necessary to extract academic information from conference Web pages. This paper proposes an approach to extract academic information from conference Web pages. Firstly, Web pages are segmented into text blocks by analyzing the visual feature and DOM structure. Then Bayes Network is used to classify these text blocks into predefined categories, and the quality of initial classification results are improved after post-processing. Finally, the academic information is extracted from the classified text blocks. Our experimental results on the real world datasets show that the proposed method is highly effective and efficient for extracting academic information from conference Web pages, and it has average 90% precision and 89% recall.
AB - Conference Web pages are the main platforms to share the conference information and organize conference events. To discover the academic knowledge from such Web pages for building academic ontologies or social networks, it is necessary to extract academic information from conference Web pages. This paper proposes an approach to extract academic information from conference Web pages. Firstly, Web pages are segmented into text blocks by analyzing the visual feature and DOM structure. Then Bayes Network is used to classify these text blocks into predefined categories, and the quality of initial classification results are improved after post-processing. Finally, the academic information is extracted from the classified text blocks. Our experimental results on the real world datasets show that the proposed method is highly effective and efficient for extracting academic information from conference Web pages, and it has average 90% precision and 89% recall.
KW - Bayes network
KW - DOM structure
KW - Visual feature
KW - Web information extraction
UR - http://www.scopus.com/inward/record.url?scp=84862959367&partnerID=8YFLogxK
U2 - 10.1109/ICTAI.2011.164
DO - 10.1109/ICTAI.2011.164
M3 - Conference contribution
AN - SCOPUS:84862959367
SN - 9780769545967
T3 - Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI
SP - 952
EP - 959
BT - Proceedings - 2011 23rd IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2011
T2 - 23rd IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2011
Y2 - 7 November 2011 through 9 November 2011
ER -