International Journal of Advances in Computer Science and Its Applications
Author(s) : RUPA G. MEHTA, VAISHALI R. PATEL
Unsupervised learning is a technique to organize the data into meaningful way having similarity. Cluster analysis is the study of clustering techniques and algorithms which are helpful to discover important patterns from fundamental data without knowledge of category label for further analysis. k-Means algorithm is one of the most popular clustering algorithm among all partition based clustering algorithm to partition a dataset into meaningful patterns. k-Means algorithm suffers from the problem of specifying the number of clusters in advance and often converges to local minima and therefore resulted clusters are heavily dependent on initial centroids. Various methods have been proposed for automatic detection of initial centroids to improve the performance and efficiency of k-Means algorithm. This paper presents an overview of clustering, clustering techniques and algorithms, addressing problems of k-Means algorithm, comparison of different methods for automatic detection of initial centroids and propose a new Hierarchical method(hk-Means) for automatically detection of initial centroids in k-Means algorithm, implementation of traditional k-Means with automatic pre-process dataset to remove noise.