
Type of Document Master's Thesis Author Metikurke, Seema Sreenivasamurthy Author's Email Address smetikurke1@student.gsu.edu URN etd-04262006-100628 Title Grid-Enabled Automatic Web Page Classification Degree Master of Science Department Computer Science Advisory Committee
Advisor Name Title Vijay K. Vaishnavi Committee Chair Rajshekhar Sunderraman Committee Co-Chair Yanqing Zhang Committee Member Keywords
- Automatic Web Page Classification
- Vector Space Model
- Genetic Algorithm
- Grid Computing
Date of Defense 2006-04-21 Availability restricted Abstract Much research has been conducted on the retrieval and classification of web-based information. A big challenge is the performance issue, especially for a classification algorithm returning results for a large set of data that is typical when accessing the Web. This thesis describes a grid-enabled approach for automatic web page classification. The basic approach is first described that uses a vector space model (VSM). An enhancement of the approach through the use of a genetic algorithm (GA) is then described. The enhanced approach can efficiently process candidate web pages from a number of web sites and classify them. A prototype is implemented and empirical studies are conducted. The contributions of this thesis are: 1) Application of grid computing to improve performance of both VSM and GA using VSM based web page classification; 2) Improvement of the VSM classification algorithm by applying GA that uniquely discovers a set of training web pages while also generating a near optimal parameter values set for VSM.Files
Filename Size Approximate Download Time (Hours:Minutes:Seconds)
28.8 Modem 56K Modem ISDN (64 Kb) ISDN (128 Kb) Higher-speed Access metikurke_seema_200605_ms.pdf 591.35 Kb 00:02:44 00:01:24 00:01:13 00:00:36 00:00:03 indicates that a file or directory is accessible from the Georgia State University campus network only.