
Type of Document Master's Thesis Author Pandya, Milan Author's Email Address milan.pandya@gmail.com URN etd-12012006-111157 Title A domain based approach to crawl the hidden web Degree Master of Science Department Computer Science Advisory Committee
Advisor Name Title Dr. Raj Sunderraman Committee Chair Dr. Saeid Belkasim Committee Member Dr. Ying Zhu Committee Member Keywords
- web crawler
- search spider
- web bot
- best first crawler
- focused web crawler
- web page
- domain based
Date of Defense 2006-11-17 Availability unrestricted Abstract There is a lot of research work being performed on indexing the Web. More and more sophisticated Web crawlers are been designed to search and index the Web faster. But all these traditional crawlers crawl only the part of Web we call “Surface Web”. They are unable to crawl the hidden portion of the Web. These traditional crawlers retrieve contents only from surface Web pages which are just a set of Web pages linked by some hyperlinks and ignoring the hidden information. Hence, they ignore tremendous amount of information hidden behind these search forms in Web pages. Most of the published research has been done to detect such searchable forms and make a systematic search over these forms. Our approach here will be based on a Web crawler that analyzes search forms and fills tem with appropriate content to retrieve maximum relevant information from the database.Files
Filename Size Approximate Download Time (Hours:Minutes:Seconds)
28.8 Modem 56K Modem ISDN (64 Kb) ISDN (128 Kb) Higher-speed Access pandya_milan_c_200612_ms.pdf 309.72 Kb 00:01:26 00:00:44 00:00:38 00:00:19 00:00:01