Description: Spider called WebCrawler or Robot, a collection of documents along the Web link roaming procedures. It generally resides on the server, by giving some of the URL, using HTTP and other standard protocols to read the documentation, then all included in the document URL is not visited as a new starting point, continue to roam until the conditions are not met until the new URL. WebCrawler' s main function is to automatically from the Web site on the Internet crawled Web documents and Web documents from the extraction of some information to describe the Web document, the site for the search engine' s database server and update the data provided additional raw data, including title, length, file creation time, HTML file, the number of various links, etc.
To Search:
File list (Check if you may need any files):
crawler\src\edu\uci\ics\crawler4j\crawler\Configurations.java
.......\...\...\...\...\.........\.......\CrawlController.java
.......\...\...\...\...\.........\.......\HTMLParser.java
.......\...\...\...\...\.........\.......\IdleConnectionMonitorThread.java
.......\...\...\...\...\.........\.......\LinkExtractor.java
.......\...\...\...\...\.........\.......\Page.java
.......\...\...\...\...\.........\.......\PageFetcher.java
.......\...\...\...\...\.........\.......\PageFetchStatus.java
.......\...\...\...\...\.........\.......\WebCrawler.java
.......\...\...\...\...\example\advanced\Controller.java
.......\...\...\...\...\.......\........\CrawlStat.java
.......\...\...\...\...\.......\........\Downloader.java
.......\...\...\...\...\.......\........\MyCrawler.java
.......\...\...\...\...\frontier\DocIDServer.java
.......\...\...\...\...\........\Frontier.java
.......\...\...\...\...\........\WebURLTupleBinding.java
.......\...\...\...\...\........\WorkQueues.java
.......\...\...\...\...\url\URLCanonicalizer.java
.......\...\...\...\...\...\WebURL.java
.......\...\...\...\...\.til\IO.java
.......\...\...\...\...\....\Util.java
.......\...\...\...\...\crawler4j\crawler
.......\...\...\...\...\example\advanced
.......\...\...\...\...\crawler4j
.......\...\...\...\...\example
.......\...\...\...\...\frontier
.......\...\...\...\...\url
.......\...\...\...\...\util
.......\...\...\...\ics
.......\...\...\uci
.......\...\edu
.......\src
crawler