Description: This procedure can be achieved on existing Web information extraction and segmentation, the results into a file called res.txt. This program is the development of the preliminary work the search engines.
To Search:
- [14] - Search engine to learn the basic e-books
- [1] - Chinese word segmentation in Chinese inf
- [An-Introduction-to-IR] - An Introduction to Information Retrieval
- [Spider] - Information on the Web crawler collectio
- [nutch] - own yourself search engine
- [wwwcn3cn] - A simple point of the search engine code
- [forictclas] - 1. In vs2008, the extract to run 2. The
- [Spider_CPP] - A C language Web crawler, you can try ru
File list (Check if you may need any files):
网页分词\DataStructure_hw1_v2.cpp
........\DictionaryClass.h
........\folder\1.html
........\......\10.html
........\......\2.html
........\......\3.html
........\......\4.html
........\......\5.html
........\......\6.html
........\......\7.html
........\......\8.html
........\......\9.html
........\......\ch_dict.txt
........\......\ch_dict_new.txt
........\......\familyName.txt
........\......\familyName_new.txt
........\......\res.txt
........\......\stoplist.txt
........\......\stoplist_new.txt
........\MyStringClass.h
........\MyStringLinkClass.h
........\SegWordClass.h
........\folder
网页分词