solr整合paoding
1.??????? 下载paoding-analysis-2.0.4-beta.zip
http://code.google.com/p/paoding/downloads/list?
2.??????? 解压压缩包至paoding-analysis-2.0.4-beta
3.??????? 设置paoding的home环境变量:
l??????? 把paoding-analysis-2.0.4-beta中的dic文件夹考到solr的home文件夹中
l??????? 进入paoding-analysis-2.0.4-beta找到paoding-analysis.jar,将其考到tomcat/webapps/solr/solr/WEB-INF/lib下
l??????? 解压paoding-analysis.jar,找到文件paoding-dic-home.properties,更改属性如下:
paoding.dic.home=D:/solr/solr/dic(D:/solr/solr为solr的home目录)
l??????? 重新编译打包成paoding-analysis.jar
4.??????? 封装paoding???
package org.paoding;
import java.io.Reader;
import java.util.Map;
import net.paoding.analysis.analyzer.PaodingTokenizer;
import net.paoding.analysis.analyzer.TokenCollector;
import net.paoding.analysis.analyzer.impl.MaxWordLengthTokenCollector;
import net.paoding.analysis.analyzer.impl.MostWordsTokenCollector;
import net.paoding.analysis.knife.PaodingMaker;
import org.apache.lucene.analysis.TokenStream;
import org.apache.solr.analysis.BaseTokenizerFactory;
/**
?*中文切词对庖丁切词的封装
?*/
publicclass ChineseTokenizerFactory extends BaseTokenizerFactory {
??? /**
???? *最多切分?? 默认模式
???? */
??? publicstaticfinal String MOST_WORDS_MODE = "most-words";
??? /**
???? *按最大切分
???? */
??? publicstaticfinal String MAX_WORD_LENGTH_MODE = "max-word-length";
??? private String mode = null;
??? publicvoid setMode(String mode) {
???????????? if (mode==null||MOST_WORDS_MODE.equalsIgnoreCase(mode)
????????????????????? || "default".equalsIgnoreCase(mode)) {
????????????????? this.mode=MOST_WORDS_MODE;
???????????? } elseif (MAX_WORD_LENGTH_MODE.equalsIgnoreCase(mode)) {
????????????????? this.mode=MAX_WORD_LENGTH_MODE;
???????????? }
???????????? else {
????????????????? thrownew IllegalArgumentException("不合法的分析器Mode参数设置:" + mode);
???????????? }
??????? }
??? @Override
??? publicvoid init(Map args) {
??????? super.init(args);
??????? setMode( (String) args.get("mode"));
??? }
??? public TokenStream create(Reader input) {
??????? returnnew PaodingTokenizer(input, PaodingMaker.make(),
????????????????? createTokenCollector());
??? }
??? private TokenCollector createTokenCollector() {
??????? if( MOST_WORDS_MODE.equals(mode))
???????????? returnnew MostWordsTokenCollector();
??????? if( MAX_WORD_LENGTH_MODE.equals(mode))
???????????? returnnew MaxWordLengthTokenCollector();
??????? thrownew Error("never happened");
??? }
?}
?
?
注:其中需要的包为solr.war中的lib库和庖丁文件包中的paoding-analysis.jar
将以上代码打包为paoding.jar(附件中可下载),考到tomcat/webapps/solr/solr/WEB-INF/lib下。
?
5.??????? 找到solr 的home目录下的conf(即D:"solr"solr"conf)中的schema.xml,做如下修改:
?
?
<fieldType?name="text"?class="solr.TextField"?positionIncrementGap="100">??
??????<analyzer?type="index">??
????????<!--<tokenizer?class="solr.WhitespaceTokenizerFactory"/>-->??
????????<tokenizer?class="org.paoding.ChineseTokenizerFactory"?mode="most-words"/>??
??????···?··· ??
??????</analyzer>??
??????<analyzer?type="query">??
?????????<!--<tokenizer?class="solr.WhitespaceTokenizerFactory"/>-->??
?????????<tokenizer?????class="org.paoding.ChineseTokenizerFactory"?mode="most-words"/>? ??
????????···?··· ??
??????</analyzer>??
</fieldType>??
其中<!-- --> 里面的为原来默认的内容
?
6.重启tomcat即可。进行测试http://localhost:8888/solr/admin/analysis.jsp