Journals Proceedings

International Journal of Advances in Computer Science and Its Applications

Semi-Supervised HSK-means Algorithm for Contextual Document Classification



Supervised learning needs a lot of labeled data to generate hypothesis function and classify test documents efficiently. In real world situations, we have a lot of unlabeled data which cannot be used in supervised learning. Hence, we propose a novel scheme for a semi-supervised learning algorithm called Half Supervised K-means algorithm. In this scheme, we input category keyword lists generated by an ontology containing lexical relations, some labeled data and test documents for classification. We have used a modified tf-idf for computing weights of keywords, labeled documents and unlabeled documents. We supply these weights to the HSK means to categories documents to their respective categories. In HSK means, the centroids are dependent on number of categories decided by users and labeled documents help to assign a category to cluster. So all documents present in same cluster automatically assigned to the category.

No fo Author(s) : 4
Page(s) : 144 - 147
Electronic ISSN : 2250 - 3765
Volume 2 : Issue 1
Views : 539   |   Download(s) : 150