Electronic Theses and Dissertation Database
Library Home  |  ` Library Catalog  |  ETD Home  |  Browse ETDs  |  Search ETDs  |  ETD Resources

Title page for ETD etd-07272007-215621


Type of Document Master's Thesis
Author Desai, Lovekeshkumar
URN etd-07272007-215621
Title A DISTRIBUTED APPROACH TO CRAWL DOMAIN SPECIFIC HIDDEN WEB
Degree Master of Science
Department Computer Science
Advisory Committee
Advisor Name Title
Dr. Charles L. Jaret Committee Chair
Dr. Donald C. Reitzes Committee Member
Dr. Robert Adelman Committee Member
Keywords
  • Deep Web
  • Breadth-first crawler
  • Search spider
  • Distributed Web crawler
  • task-specific and Domain Specific.
  • Hidden Web
  • Content Extraction
Date of Defense 2007-07-13
Availability unrestricted
Abstract
A large amount of on-line information resides on the invisible web - web pages generated dynamically from databases and other data sources hidden from current crawlers which retrieve content only from the publicly indexable Web. Specially, they ignore the tremendous amount of high quality content "hidden" behind search forms, and pages that require authorization or prior registration in large searchable electronic databases. To extracting data from the hidden web, it is necessary to find the search forms and fill them with appropriate information to retrieve maximum relevant information. To fulfill the complex challenges that arise when attempting to search hidden web i.e. lots of analysis of search forms as well as retrieved information also, it becomes eminent to design and implement a distributed web crawler that runs on a network of workstations to extract data from hidden web. We describe the software architecture of the distributed and scalable system and also present a number of novel techniques that went into its design and implementation to extract maximum relevant data from hidden web for achieving high performance.
Files
  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  desai_lovekeshkumar_200707_ms.pdf 407.14 Kb 00:01:53 00:00:58 00:00:50 00:00:25 00:00:02

Browse All Available ETDs by ( Author | Department )

Click here to send a comment to ETD Support