Description: Web crawlers Download Web pages from the world wide web for search engines. Generally divided into traditional reptiles and focused crawler.
The traditional crawler starts from one or several "initial URL, the initial URL on the page, in the process of crawling, continuously from the current page from the new URL queue, until the system must stop condition. Popular speaking, that is, through the source code to get the content you want.
To Search:
File list (Check if you may need any files):
bplatt
bplatt\spider
bplatt\spider\PageInfo.java
bplatt\spider\Arachnid.java
bplatt\spider\SimpleHTMLParser.java
bplatt\spider\SimpleHTMLToken.java
bplatt\spider\WebPageXtractor.java
ServerStressTest.java
GetGraphics.java
SimpleSiteMapGen.java
build.xml
GPL.txt
readme.txt
Arachnid.html