Description: General web crawler, also known as Crawler Web (Scalable), crawling objects from some seed URL expansion to the entire Web, mainly for portal sites, search engines and large Web service providers to collect data. Their technical details are rarely published for commercial reasons. The range and quantity of this kind of crawling web crawler to crawl speed and huge storage space for higher requirements, requirements of the order page crawling is relatively low, at the same time due to refresh the page too much, usually with parallel, but take a long time to refresh a page. Although there are some defects, the general web crawler is suitable for searching for a wide range of topics for search engines, and has a strong application value.
To Search:
File list (Check if you may need any files):
spider_baike-master
spider_baike-master\README.md
spider_baike-master\__init__.py
spider_baike-master\html_downloader.py
spider_baike-master\html_outputer.py
spider_baike-master\html_parser.py
spider_baike-master\requirements.txt
spider_baike-master\spider_main.py
spider_baike-master\url_manager.py