Description: The body of the search engine extraction process, through analysis and regular html remove html code to retain the page text, only effective against the Chinese. Slightly modified to use English.
- [preprocess] - downloaded from the Internet from the we
- [NewsSpider] - past, I wrote it myself to crawl busines
- [cspider100] - the very integrity of the network Reptil
- [mars212dc_Html] - I wrote it myself, from website text pro
- [Ccharp] - Vertical search sites, code downloads, h
- [FFTwebpagepickup] - base FFT web page text pick up
- [SearchEngine] - Prepared using CSharp source, developmen
- [vchtml] - VC access to web pages of the collection
- [gekhtml] - Based on ekhtml, Automatic extraction of
- [starservices] - java web crawler analyzes the code of we
File list (Check if you may need any files):
bin
...\Debug
obj
...\Debug
...\.....\ResolveAssemblyReference.cache
...\.....\TempPE
...\.....\TestAnalyzer.csproj.FileListAbsolute.txt
...\.....\TestAnalyzer.exe
...\.....\TestAnalyzer.pdb
...\Release
...\.......\Refactor
...\.......\ResolveAssemblyReference.cache
...\.......\TempPE
...\.......\TestAnalyzer.exe
...\TestAnalyzer.csproj.FileListAbsolute.txt
Program.cs
Properties
..........\AssemblyInfo.cs
Readme.txt
release
.......\HtmlAgilityPack.dll
.......\Net.LikeShow.ContentAnalyze.dll
.......\TestAnalyzer.exe
.......\TestAnalyzer.pdb
.......\TestAnalyzer.vshost.exe
.......\TestAnalyzer.vshost.exe.manifest
TestAnalyzer.csproj
TestAnalyzer.sln
UpgradeLog.XML
_UpgradeReport_Files
....................\UpgradeReport.css
....................\UpgradeReport.xslt
....................\UpgradeReport_Minus.gif
....................\UpgradeReport_Plus.gif