TY - GEN
T1 - A new vision-based method for extracting academic information from conference web pages
AU - Wang, Peng
AU - Zhou, Mingqi
AU - You, Yue
AU - Zhang, Xiang
PY - 2012
Y1 - 2012
N2 - This paper proposes a new vision-based method for extracting academic information from conference Web pages. The main contributions include: (1) An new vision-based page segmentation algorithm is proposed to improve the result of classical VIPS algorithm. This algorithm can divide pages into text blocks. (2) All text blocks are classified as 10 categories according to vision features, keyword features and text content features. The initial classification results have 75% precision and 67% recall. (3) The context information of text blocks are employed to repair and refine initial classification results, which are improved to 96% precision and 98% recall. Finally, academic information is extracted from classified text blocks. Our experimental results on real-world datasets show that the proposed method is effective and efficient for extracting academic information from conference Web pages.
AB - This paper proposes a new vision-based method for extracting academic information from conference Web pages. The main contributions include: (1) An new vision-based page segmentation algorithm is proposed to improve the result of classical VIPS algorithm. This algorithm can divide pages into text blocks. (2) All text blocks are classified as 10 categories according to vision features, keyword features and text content features. The initial classification results have 75% precision and 67% recall. (3) The context information of text blocks are employed to repair and refine initial classification results, which are improved to 96% precision and 98% recall. Finally, academic information is extracted from classified text blocks. Our experimental results on real-world datasets show that the proposed method is effective and efficient for extracting academic information from conference Web pages.
KW - Web information extraction
KW - Web page segmentation
KW - bayesian network classifier
UR - http://www.scopus.com/inward/record.url?scp=84876856003&partnerID=8YFLogxK
U2 - 10.1109/ICTAI.2012.138
DO - 10.1109/ICTAI.2012.138
M3 - Conference contribution
AN - SCOPUS:84876856003
SN - 9780769549156
T3 - Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI
SP - 976
EP - 981
BT - Proceedings - 2012 IEEE 24th International Conference on Tools with Artificial Intelligence, ICTAI 2012
T2 - 2012 IEEE 24th International Conference on Tools with Artificial Intelligence, ICTAI 2012
Y2 - 7 November 2012 through 9 November 2012
ER -