Clustering is an unsupervised task. It is useful in several exploratory pattern analyses, document grouping, image segmentation and decision-making in data mining. Combinatorially, data clustering is a difficult problem. Further, traditional methods such as k-means and its variants use an iterative procedure and yield a local minimum. K-means algorithm is also very sensitive to noisy and outlier's data point, which are frequently encountered in data mining field. Though k-medoid algorithm was found to be better than k-means for outliers or other extreme values, it may be trapped in numerous local minima. In order to overcome its shortcomings, this article presents a genetic k-medoid data clustering algorithm. This algorithm brings out several improvements over the k-medoid algorithm. The experimental results confirm the superiority of this algorithm over k-medoid algorithms.
Traditional manual methods of data
analysis, such as spreadsheets and ad hoc
queries are overwhelmed when used to
analyze the tremendous volume and
diversity of real world data embedded in
huge databases. There is an urgent need
for the development of a new generation of
techniques and tools with the ability to
intelligently and automatically perform
analysis of huge volumes of stored data for
nuggets of useful knowledge.
Clustering is a common data-mining
task and refers to discovery of interesting
data distributions in the underlying data
space. Given a large data set, consisting of
multidimensional data points or patterns,
the data space is usually not uniformly
occupied. The aim of clustering procedures
is to effectively partition a heterogeneous
multidimensional data set into groups
having more homogenous characteristics
[19]. The formation of clusters is based on
the principle of maximizing similarity
between patterns belonging to
clusters. Similarity or proximity is usually
defined as a distance function on pairs of
patterns and based on the values of the
features of these patterns [3]. |