Introduction - If you have any usage issues, please Google them yourself
"I am a Chinese," ChineseTokenizer will split into the five words: "I am,,, countries and people", CJKTokenizer can be divided into "I am, is, China, the people in the" four second section of the word. The problem of the former is that there are no Chinese words to consider, such as the search for "I am Chinese". The latter problem is to make a lot of meaningless words such as "the Chinese", so that the index does not need to increase, reducing the search efficiency.
Packet : 117143159siuying_segment.rar filelist
build.xml
src\org\apache\lucene\analysis\cjk\CJKAnalyzer.java
src\org\apache\lucene\analysis\cjk\CJKTokenizer.java
src\org\apache\lucene\analysis\cjk
src\org\apache\lucene\analysis\cn\ChineseAnalyzer.java
src\org\apache\lucene\analysis\cn\ChineseFilter.java
src\org\apache\lucene\analysis\cn\ChineseTokenizer.java
src\org\apache\lucene\analysis\cn
src\org\apache\lucene\analysis\cw\bothlexu8.txt
src\org\apache\lucene\analysis\cw\CharStream.java
src\org\apache\lucene\analysis\cw\CStandardTokenizer.java
src\org\apache\lucene\analysis\cw\CStandardTokenizer.jj
src\org\apache\lucene\analysis\cw\CStandardTokenizerConstants.java
src\org\apache\lucene\analysis\cw\CStandardTokenizerTokenManager.java
src\org\apache\lucene\analysis\cw\CWordAnalyzer.java
src\org\apache\lucene\analysis\cw\CWordFilter.java
src\org\apache\lucene\analysis\cw\CWordFilter.java~
src\org\apache\lucene\analysis\cw\CWordTokenizer.java
src\org\apache\lucene\analysis\cw\CWordTokenizer.java~
src\org\apache\lucene\analysis\cw\data\sforeign_u8.txt
src\org\apache\lucene\analysis\cw\data\snotname_u8.txt
src\org\apache\lucene\analysis\cw\data\snumbers_u8.txt
src\org\apache\lucene\analysis\cw\data\ssurname_u8.txt
src\org\apache\lucene\analysis\cw\data\tforeign_u8.txt
src\org\apache\lucene\analysis\cw\data\tnotname_u8.txt
src\org\apache\lucene\analysis\cw\data\tnumbers_u8.txt
src\org\apache\lucene\analysis\cw\data\tsurname_u8.txt
src\org\apache\lucene\analysis\cw\data
src\org\apache\lucene\analysis\cw\ParseException.java
src\org\apache\lucene\analysis\cw\Segmenter.jav.old
src\org\apache\lucene\analysis\cw\segmenter.java
src\org\apache\lucene\analysis\cw\segmenter.java~
src\org\apache\lucene\analysis\cw\SegmenterUtils.java
src\org\apache\lucene\analysis\cw\SegmenterUtils.java~
src\org\apache\lucene\analysis\cw\simplexu8.txt
src\org\apache\lucene\analysis\cw\test\SegmenterUtilsTest.java
src\org\apache\lucene\analysis\cw\test\SegmenterUtilsTest.java~
src\org\apache\lucene\analysis\cw\test
src\org\apache\lucene\analysis\cw\Token.java
src\org\apache\lucene\analysis\cw\TokenMgrError.java
src\org\apache\lucene\analysis\cw\tradlexu8.txt
src\org\apache\lucene\analysis\cw
src\org\apache\lucene\analysis
src\org\apache\lucene\demo\DeleteFiles.java
src\org\apache\lucene\demo\FileDocument.java
src\org\apache\lucene\demo\IndexCJKFiles.java
src\org\apache\lucene\demo\IndexFiles.java
src\org\apache\lucene\demo\SearchCJKFiles.java
src\org\apache\lucene\demo\SearchFiles.java
src\org\apache\lucene\demo
src\org\apache\lucene
src\org\apache
src\org
src