Description: Try to crawl the content of external sites from a what? If yes, then you definitely trapped in the external sites chaos html page, you can follow the normal procedure to resolve the structure of html page without broken, I believe this is certainly your head. Then you try this xml_html resolver. Estimate to solve your problem.
To Search:
- [TL] - Landing classic source, C# of the WINDOW
- [parse_html_demo] - HTML code for a resolution, which demons
- [HTMLparser] - htmlparser can parse html document, you
File list (Check if you may need any files):
AbstractNode.cs
analyzer
........\LineStrMatcher.cs
AssemblyInfo.cs
Attr.cs
AttrFilter.cs
bin
...\Debug
...\Release
Enumerator.cs
JspComment.cs
JspPageTag.cs
KeyNode.cs
MltagUtil.cs
Node.cs
NodeParser.cs
NodeSetParser.cs
obj
...\Debug
...\Release
...\XmlHtmlLib.csproj.FileList.txt
ParserFactory.cs
ServerPageTag.cs
util
....\FilterReader.cs
....\HashSet.cs
....\HeadBufferedReader.cs
....\HeadBufferedStream.cs
....\HtmlTags.cs
....\PushbackReader.cs
....\PushbackStream.cs
....\StringTokenizer.cs
....\XmlUtil.cs
VectorEnumerator.cs
XHtmlPath.cs
XmlComment.cs
XmlDeclNode.cs
XmlHtmlLib.csproj
XmlHtmlLib.csproj.user
XmlHtmlParser.csproj.user
XmlNode.cs
XmlText.cs