Introduction - If you have any usage issues, please Google them yourself
1, the ability to lock a particular theme crawls; 2, can produce log text file format : timestamp (timestamp), the URL; 3. crawls up a URL to allow for the establishment of two connecting (Note : local website for a few analytical thread is not limited) 4, abide by the rules of civilized spiders : to be analyzed robots.txt file and meta tag unrestricted; End grasp a thread after a website to sleep two seconds; 5, capable of HTML pages for analysis, Links to extract URL, the extract can judge whether the URL have been processed. Analysis has not repeat crawl over the web; 6. to the spider/crawler some of the basic procedures for setting up parameters, including : Grasp depth (depth), seeds URL; 7. use User-agent to the server to identify themselves; 8, crawls produce statistical informati
Packet : 39709548subjectspider_bykelvenju.rar filelist
tradlexu8.txt
data\sforeign_u8.txt
data\snotname_u8.txt
data\snumbers_u8.txt
data\ssurname_u8.txt
data\tforeign_u8.txt
data\tnotname_u8.txt
data\tnumbers_u8.txt
data\tsurname_u8.txt
CheckLinks.java
HTMLParse.java
ISpiderReportable.java
segmenter.java
Spider.java
bothlexu8.txt
simplexu8.txt
data
tf.java
日志文件[www.scut.edu.cn].rar
日志文件[money.163.com].rar
祝庆荣-主题网络蜘蛛程序设计及JAVA实现.doc