Description: Implemented using Java web crawler programs, changing programs targeted at a specific site data acquisition, but the reptiles of the ideas and methods have been listed out in full expression.
- [intersectingsearchengine.Rar] - I was doing the search engine project Ka
- [chem] - Tsinghua Tongfang inside data on the che
- [topicCrawler] - Related to a network of reptiles, with a
- [heritrix-1.10.1] - An open source web page of an open sourc
- [focusedspider] - a focused spider based on java and mysq
- [zhizhu] - java version of the spider web crawler s
- [snoics-reptile2.0] - This is a complete and full-featured web
- [sxt_Lucene] - The school is still a very good search e
- [CMS] - The Sports Exchange web site to achieve
- [zhizhu] - JAVA development of a simple Web crawler
File list (Check if you may need any files):
ZhiZhuSpider\.classpath
............\.mymetadata
............\.project
............\.settings\com.genuitec.eclipse.core.prefs
............\.........\org.eclipse.core.resources.prefs
............\dist\Sohu.war
............\nbproject\ant-deploy.xml
............\.........\build-impl.xml
............\.........\genfiles.properties
............\.........\private\private.properties
............\.........\.......\private.xml
............\.........\project.properties
............\.........\project.xml
............\news.sql
............\src\com\sohu\bean\NewsBean.java
............\...\...\....\crawler\Crawler.java
............\...\...\....\.......\LinkDB.java
............\...\...\....\.......\LinkFilter.java
............\...\...\....\.......\LinkParser.java
............\...\...\....\.......\NewsToDB.java
............\...\...\....\.......\Queue.java
............\...\...\....\db\ConnectionManager.java
............\...\...\....\SearchCrawler.java
............\...\...\....\servlet\GetNewsServlet.java
............\...\...\....\SohuNews.java
............\test\com\sohu\SohuNewsTest.java
............\WebRoot\detail.jsp
............\.......\index.jsp
............\.......\META-INF\context.xml
............\.......\........\MANIFEST.MF
............\.......\WEB-INF\classes\com\sohu\servlet\GetNewsServlet.class
............\.......\.......\.......\...\....\.......\GetNewsServlet$1.class
............\.......\.......\.......\...\....\db\ConnectionManager.class
............\.......\.......\.......\...\....\crawler\Queue.class
............\.......\.......\.......\...\....\.......\NewsToDB.class
............\.......\.......\.......\...\....\.......\LinkParser.class
............\.......\.......\.......\...\....\.......\LinkParser$1.class
............\.......\.......\.......\...\....\.......\LinkParser$2.class
............\.......\.......\.......\...\....\.......\LinkFilter.class
............\.......\.......\.......\...\....\.......\LinkDB.class
............\.......\.......\.......\...\....\.......\Crawler.class
............\.......\.......\.......\...\....\.......\Crawler$1.class
............\.......\.......\.......\...\....\bean\NewsBean.class
............\.......\.......\.......\...\....\SohuNews.class
............\.......\.......\.......\...\....\SohuNews$1.class
............\.......\.......\.......\...\....\SearchCrawler.class
............\.......\.......\lib\commons-codec-1.3.jar
............\.......\.......\...\commons-httpclient-3.1.jar
............\.......\.......\...\commons-logging-1.0.4.jar
............\.......\.......\...\htmllexer.jar
............\.......\.......\...\htmlparser.jar
............\.......\.......\...\mysql.jar
............\.......\.......\web.xml
............\.......\.......\classes\com\sohu\servlet
............\.......\.......\.......\...\....\db
............\.......\.......\.......\...\....\crawler
............\.......\.......\.......\...\....\bean
............\.......\.......\.......\...\sohu
............\src\com\sohu\bean
............\...\...\....\crawler
............\...\...\....\db
............\...\...\....\servlet
............\WebRoot\WEB-INF\classes\com
............\src\com\sohu
............\test\com\sohu
............\WebRoot\WEB-INF\classes
............\.......\.......\lib
............\nbproject\private
............\src\com
............\test\com
............\WebRoot\META-INF
............\.......\WEB-INF
............\.settings
............\dist
............\nbproject
............\src
............\test
............\WebRoot
ZhiZhuSpider