MBA Review Magazine | Web Mining and Text Mining

MBA Review Magazine:

Web Mining and Text Mining

Article Details

Pub. Date	:	February, 2008
Product Name	:	MBA REVIEW
Product Type	:	FINANCIAL MARKETS
Product Code	:	MBAFM10802
Author Name	:	Deepali Dave and Veena Bhat
Availability	:	YES
Subject/Domain	:	Management
Download Format	:	PDF Format
No. of Pages	:	5

Price

For delivery in electronic format: Rs. 50; For delivery through courier (within India): Rs. 50 + Rs. 25 for Shipping & Handling Charges

Download

To download this Article click on the button below:

Abstract

This is a conceptual article compiled to contribute towards imparting a comprehensive outlook and improved understanding of web mining and text mining. It delves into topics like what is web mining, necessity of web mining, techniques of web mining, relationship between web mining and text mining, real world text mining examples as well as the future applications of text mining.

Description

Web mining can be generally defined as an application that uses data mining to automatically discover and analyze useful information from numerous resources available in the form of documents or database on the World Wide Web (www). In other words, it is to extract and mine useful information from web. The data sources that are supported can be heterogeneous, dispersed, or even distributed.

Web mining thus, starts with resource discovery, information extraction from the appropriate resources identified, generalization (finding general patterns in the websites or across web pages) and finally analysis of the extracted information. Web Content Mining: This includes the automatic search on the www for content that is data. The data can be in any form—unstructured (simple text data), structured (HTML pages generated by databases or using XML) or even semi-structured HTML files). The web data can be in any format—text, image, audio, video, etc., though in the initial stages of web content mining, it is limited just to text documents. Web content mining can be categorized as agent-based approach or database approach. Agent-based approach uses the technique of agents—intelligent or personalized, for information identification, collection, and retrieval. The database approach concentrates on transforming the semi-structured data available on the eb into structured collections of resources, using standard database querying mechanisms(1). Data mining techniques are then used to analyze the data.

Web Structure Mining: This branch deals with the mining of the web structure. The first method of web structure mining is the mining of the hyperlink and classification of the websites and the web pages according to the information derived from mining the hyperlink. The second type of mining is mining the structure of the web page itself, to analyze the inter-page relations, again using the hyperlink that the pages are connected through. The standard techniques of web structure mining is Google's PageRank, CLEVER, HITS, etc.

Keywords

MBA Review Magazine, Web Mining, World Wide Web, WWW, Business Intelligence, E-Commerce, Web Mining Techniques, HTML, TEXT, PDF, Web Structure Mining, Text Mining, Data Warehouse, Information Retrieval system, Artificial Intelligence.