Description: imdict-chinese-analyzer is a smart imdict Chinese Dictionary smart module segmentation algorithm based on Hidden Markov Model (Hidden Markov Model, HMM), the Chinese Academy of Sciences Institute of Computing Technology of Chinese word segmentation ictclas process re-implement (based on Java ), can be directly provided for the lucene search engine support for Simplified Chinese word segmentation.
File list (Check if you may need any files):
chinese-analyzer
................\.classpath
................\.project
................\analysis-data
................\.............\bigramdict.dct
................\.............\coredict.dct
................\.............\license.txt
................\.............\readme.txt
................\.............\stopwords_utf8.txt
................\lib
................\...\log4j-1.2.15.jar
................\...\lucene-core-2.4.0.jar
................\src
................\...\net
................\...\...\imdict
................\...\...\......\analysis
................\...\...\......\........\chinese
................\...\...\......\........\.......\AnalyzerProfile.java
................\...\...\......\........\.......\ChineseAnalyzer.java
................\...\...\......\........\.......\SentenceTokenizer.java
................\...\...\......\........\.......\WordTokenizer.java
................\...\...\......\........\stopword
................\...\...\......\........\........\StopDictionary.java
................\...\...\......\........\........\StringComparator.java
................\...\...\......\wordsegment
................\...\...\......\...........\dictionary
................\...\...\......\...........\..........\AbstractDictionary.java
................\...\...\......\...........\..........\BigramDictionary.java
................\...\...\......\...........\..........\WordDictionary.java
................\...\...\......\...........\hhmm
................\...\...\......\...........\....\BiSegGraph.java
................\...\...\......\...........\....\HHMMSegmenter.java
................\...\...\......\...........\....\PathNode.java
................\...\...\......\...........\....\SegGraph.java
................\...\...\......\...........\....\SegToken.java
................\...\...\......\...........\....\SegTokenFilter.java
................\...\...\......\...........\....\SegTokenPair.java
................\...\...\......\...........\util
................\...\...\......\...........\....\CharType.java
................\...\...\......\...........\....\Utility.java
................\...\...\......\...........\....\WordType.java
................\...\...\......\...........\WordSegmenter.java
................\test
................\....\net
................\....\...\imdict
................\....\...\......\analysis
................\....\...\......\........\test
................\....\...\......\........\....\AnalyzerTest.java
................\....\...\......\........\....\StringTest.java
................\test.txt