Location:
Search - Java Crawler
Search list
Description: 一个Web爬虫(机器人,蜘蛛)Java类库,最初由Carnegie Mellon 大学的Robert Miller开发。支持多线程,HTML解析,URL过滤,页面配置,模式匹配,镜像,等等。-a Web Crawler (robots, spiders) Java class libraries, initially by the Carnegie Mellon University's Robert Miller development. Supports multi-threading, HTML parsing URL filtering, and the page configuration, pattern matching, image, and so on.
Platform: |
Size: 474112 |
Author: 徐欣 |
Hits:
Description: Search Crawler 是用于Web搜索的一个基本的搜索程序,它展示了基于搜索程序的应用程序的基础框架。-Search Crawler Web search for a basic search procedures, it features based on the search application's basic framework.
Platform: |
Size: 6144 |
Author: 陈宁 |
Hits:
Description: 一个搜索引擎类,使用方法:在命令窗口输入:
D:\>java SearchCrawler http://www.sina.com 20 java-a search engine category, the use of methods : the command window : D : \ gt; Java SearchCrawler http://www.sina.com 20 java
Platform: |
Size: 3072 |
Author: loon |
Hits:
Description: 本源码简单易懂,便于JAVA初学者参考编程,适合研究搜索引擎-the source straightforward, easy reference beginners JAVA programming, for the study of search engine
Platform: |
Size: 3072 |
Author: 杨登峰 |
Hits:
Description: 网页抓取器又叫网络机器人(Robot)、网络爬行者、网络蜘蛛。网络机器人(Web Robot),也称网络蜘蛛(Spider),漫游者(Wanderer)和爬虫(Crawler),是指某个能以人类无法达到的速度不断重复执行某项任务的自动程序。他们能自动漫游与Web站点,在Web上按某种策略自动进行远程数据的检索和获取,并产生本地索引,产生本地数据库,提供查询接口,共搜索引擎调用。-web crawling robots- known network (Robot), Web crawling, spider network. Network Robot (Web Robot), also called network spider (Spider), rovers (Wanderer) and reptiles (Crawler), is a human can not reach the speed of repeated execution of a mandate automatic procedures. They can automatically roaming and Web site on the Web strategy by some automatic remote data access and retrieval, Index and produce local, have local database, which provides interfaces for a total of search engine called.
Platform: |
Size: 20480 |
Author: shengping |
Hits:
Description: 一个很好的搜索引擎爬行器程序,想了解搜索引擎原理的朋友可以看看这个。-a good search engine crawling with procedures that to understand the principles of search engine you can look at this.
Platform: |
Size: 16788480 |
Author: zhaomin |
Hits:
Description: java 开发的网页爬虫,使用广度搜索,对网页的所有链接进行查找,并分析其链接,找出一级域名的所有网址,并将其添加到待处理列表,站外链接只作记录,不作处理,软件有界面,src文件夹里面有源码,myCrawler.jar可直接运行-java development of the website reptiles, the use of search breadth of the website link for you all, and analysis of their link to find a domain name all the sites, and add to the list of pending, station link only for the record. without treatment, a software interface, src folder contains source code, myCrawler.jar can run
Platform: |
Size: 8498176 |
Author: 江如基 |
Hits:
Description: 一个用JAVA编写的小小爬虫,在做实验的时候觉得挺好的,拿来大家分享下,看看没什么损失的~`-with JAVA prepared a small reptile in the experiments think it's quite good, we used to share. see no loss of ~ `
Platform: |
Size: 12288 |
Author: Elaine |
Hits:
Description: 1、锁定某个主题抓取;
2、能够产生日志文本文件,格式为:时间戳(timestamp)、URL;
3、抓取某一URL时最多允许建立2个连接(注意:本地作网页解析的线程数则不限)
4、遵守文明蜘蛛规则:必须分析robots.txt文件和meta tag有无限制;一个线程抓完一个网页后要sleep 2秒钟;
5、能对HTML网页进行解析,提取出链接URL,能判别提取的URL是否已处理过,不重复解析已crawl过的网页;
6、能够对spider/crawler程序的一些基本参数进行设置,包括:抓取深度(depth)、种子URL等;
7、使用User-agent向服务器表明自己的身份;
8、产生抓取统计信息:包括抓取速度、抓取完成所需时间、抓取网页总数;重要变量和所有类、方法加注释;
9、请遵守编程规范,如类、方法、文件等的命名规范,
10、可选:GUI图形用户界面、web界面,通过界面管理spider/crawler,包括启停、URL增删等
-1, the ability to lock a particular theme crawls; 2, can produce log text file format : timestamp (timestamp), the URL; 3. crawls up a URL to allow for the establishment of two connecting (Note : local website for a few analytical thread is not limited) 4, abide by the rules of civilized spiders : to be analyzed robots.txt file and meta tag unrestricted; End grasp a thread after a website to sleep two seconds; 5, capable of HTML pages for analysis, Links to extract URL, the extract can judge whether the URL have been processed. Analysis has not repeat crawl over the web; 6. to the spider/crawler some of the basic procedures for setting up parameters, including : Grasp depth (depth), seeds URL; 7. use User-agent to the server to identify themselves; 8, crawls produce statistical informati
Platform: |
Size: 1911808 |
Author: |
Hits:
Description: 这是一个WEB CRAWLER程序,能下载同一网站上的所有网页-This is a WEB CRAWLER procedures, can download the same site all pages
Platform: |
Size: 3072 |
Author: xut |
Hits:
Description: 一个简单的在互联网上抓包的程序,仅供大家参考-A simple Internet capture procedures, for your reference
Platform: |
Size: 2197504 |
Author: ahsm |
Hits:
Description: java写的crawler,看看看不懂,大家一起研究一下吧!-java wrote crawler, can not read to see if we can work together to look at it!
Platform: |
Size: 702464 |
Author: 刘双 |
Hits:
Description: java下的 多线程爬虫
输入线程数目, 生成相应线程-java crawler
Platform: |
Size: 711680 |
Author: liuminghai |
Hits:
Description: web crawler, 一个java的爬虫。-web crawler
Platform: |
Size: 193536 |
Author: alajfel |
Hits:
Description: 一个针对分主题的网页分析和下载系统,能主动下载信息详细页-Automatically analyze and download classified web pages
Platform: |
Size: 11264 |
Author: 姚贤明 |
Hits:
Description: 一个简单容易的java爬虫例子,谢谢了啊-dfdfdfdfdfdf
Platform: |
Size: 6144 |
Author: 孙卡 |
Hits:
Description: java爬虫 网络爬虫是一个自动提取网页的程序,它为搜索引擎从万维网上下载网页,是搜索引擎的重要组成-java crawler
Platform: |
Size: 4096 |
Author: 邓天航 |
Hits:
Description: 一款简单的java爬虫+搜索引擎,比较适合用于自己学习(A simple java crawler + search engine)
Platform: |
Size: 7322624 |
Author: AliceEndless
|
Hits:
Description: 这是一个java的爬虫工具包jsoup的jar包,有自己修改过的代码,可以支持传输字符编码,原来的jar包在抓包时,传输字符编码是写死的(This is a Java crawler kit jsoup jar package, have their own modified code, can support the transmission of character encoding, the original jar packet in packet capture, transmission character encoding is coded)
Platform: |
Size: 397312 |
Author: pizichong
|
Hits:
Description: java爬虫程序,简单实用,方便初学者学习!(Java crawler program, simple and practical, easy for beginners to learn.)
Platform: |
Size: 16384 |
Author: someuser |
Hits: