Journals Proceedings

International Journal of Advances in Computer Science and Its Applications

A Text Mining Approach for Automatic Classification Of Web Pages

Author(s) : BHUMIKA GUPTA, SURABHI LINGWAL

Abstract

Today the web contains a huge amount of information provided as html and xml pages and their number is growing rapidly with expansion of the web. In Web text mining, the text extraction and filtering of extracted content is the foundation of text mining. Automatic Classification of text is a semi-supervised machine learning task that automatically classify a given document to a set of pre-defined categories based on its features and text content. This paper explains a generic strategy for automatic classification of web pages that deals with unstructured and semi-structured text. This work classified the datasets into different labeled classes using kNN and Naïve Bayesian classification techniques. The experimental evaluation concluded that kNN has better accuracy, precision and recall value as compared to Naïve Bayesian classification. This paper presents a unified approach that is able to provide robust classification and validation of web pages to different categories.

No fo Author(s) : 2
Page(s) : 77 - 81
Electronic ISSN : 2250 - 3765
Volume 3 : Issue 3
Views : 491   |   Download(s) : 151