Electronic Theses and Dissertation Database
Library Home  |  ` Library Catalog  |  ETD Home  |  Browse ETDs  |  Search ETDs  |  ETD Resources

Title page for ETD etd-12012006-111157


Type of Document Master's Thesis
Author Pandya, Milan
Author's Email Address milan.pandya@gmail.com
URN etd-12012006-111157
Title A domain based approach to crawl the hidden web
Degree Master of Science
Department Computer Science
Advisory Committee
Advisor Name Title
Dr. Raj Sunderraman Committee Chair
Dr. Saeid Belkasim Committee Member
Dr. Ying Zhu Committee Member
Keywords
  • web crawler
  • search spider
  • web bot
  • best first crawler
  • focused web crawler
  • web page
  • domain based
Date of Defense 2006-11-17
Availability unrestricted
Abstract
There is a lot of research work being performed on indexing the Web. More and more sophisticated Web crawlers are been designed to search and index the Web faster. But all these traditional crawlers crawl only the part of Web we call “Surface Web”. They are unable to crawl the hidden portion of the Web. These traditional crawlers retrieve contents only from surface Web pages which are just a set of Web pages linked by some hyperlinks and ignoring the hidden information. Hence, they ignore tremendous amount of information hidden behind these search forms in Web pages. Most of the published research has been done to detect such searchable forms and make a systematic search over these forms. Our approach here will be based on a Web crawler that analyzes search forms and fills tem with appropriate content to retrieve maximum relevant information from the database.
Files
  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  pandya_milan_c_200612_ms.pdf 309.72 Kb 00:01:26 00:00:44 00:00:38 00:00:19 00:00:01

Browse All Available ETDs by ( Author | Department )

Click here to send a comment to ETD Support