Genetic K-medoid Clustering

The IUP Journal of Information Technology :

Article Details

Pub. Date	:	March, 2006
Product Name	:	The IUP Journal of INFORMATION TECHNOLOGY
Product Type	:	Article
Product Code	:	IJIT20603
Author Name	:	Satchidananda Dehuri, Ashish Ghosh and Rajib Mall
Availability	:	YES
Subject/Domain	:	Science and Technology
Download Format	:	PDF Format
No. of Pages	:	12

Price

For delivery in electronic format: Rs. 50; For delivery through courier (within India): Rs. 50 + Rs. 25 for Shipping & Handling Charges

Download

To download this Article click on the button below:

Description

Clustering is an unsupervised task. It is useful in several exploratory pattern analyses, document grouping, image segmentation and decision-making in data mining. Combinatorially, data clustering is a difficult problem. Further, traditional methods such as k-means and its variants use an iterative procedure and yield a local minimum. K-means algorithm is also very sensitive to noisy and outlier's data point, which are frequently encountered in data mining field. Though k-medoid algorithm was found to be better than k-means for outliers or other extreme values, it may be trapped in numerous local minima. In order to overcome its shortcomings, this article presents a genetic k-medoid data clustering algorithm. This algorithm brings out several improvements over the k-medoid algorithm. The experimental results confirm the superiority of this algorithm over k-medoid algorithms.

Traditional manual methods of data analysis, such as spreadsheets and ad hoc queries are overwhelmed when used to analyze the tremendous volume and diversity of real world data embedded in huge databases. There is an urgent need for the development of a new generation of techniques and tools with the ability to intelligently and automatically perform analysis of huge volumes of stored data for nuggets of useful knowledge.

Clustering is a common data-mining task and refers to discovery of interesting data distributions in the underlying data space. Given a large data set, consisting of multidimensional data points or patterns, the data space is usually not uniformly occupied. The aim of clustering procedures is to effectively partition a heterogeneous multidimensional data set into groups having more homogenous characteristics [19]. The formation of clusters is based on the principle of maximizing similarity between patterns belonging to clusters. Similarity or proximity is usually defined as a distance function on pairs of patterns and based on the values of the features of these patterns [3].

Keywords

Genetic K-medoid Clustering, exploratory pattern analyses, document grouping, image segmentation, decision-making, data mining, genetic k-medoid data clustering algorithm, spreadsheets, data distributions, homogenous characteristics.