International Journal of Advances in Computer Science and Its Applications
Author(s) : M.J.YEOLA
Document clustering is an area that deals with the unsupervised grouping of text documents into meaningful groups, usually representing topics in the document collection. It is one way to organize information without requiring prior knowledge about the classification of documents. The wellknown K-means clustering algorithm allows users to specify the number of clusters. However, if the respecified number of clusters is modified, the precision of each result also changes. To solve this problem, this paper proposes a new clustering algorithm based on the Kea keyphrase extraction algorithm. In this paper, documents are grouped into several clusters like K-means, but the number of clusters is automatically determined by finding out the similarities between documents and the extracted keyphrases. It also calculates F-measure value using precision and recall which gives the better clusters.