Lucene 模糊检索结果不正确
环境:Lucene 3.0.1 从数据库里抽取了一些记录用lucene进行索引并检索。现在的问题是检索结果集中有些记录不应该出现。程序和检索结果如下。新手上路,麻烦帮看看:
public class DBLuceneSearch
{
public static void main(String[] args) throws CorruptIndexException, IOException
{
TopDocs hits = null;
String index = "D:\\workspace\\lucnentest\\dbindex"; //索引位置
String field = "title"; // 查询字段
String queryString = "science";
//Query query = null;
FuzzyQuery query = null;
IndexReader reader = IndexReader.open(FSDirectory.open(new File(index)), false);
Searcher searcher = new IndexSearcher(reader);
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30);
try
{
/***** 下面将进行多种检索方式的实验 *****/
/***** 1、一个关键字,对一个字段进行查询 *****/
//QueryParser qp = new QueryParser(Version.LUCENE_30, field, analyzer);
//query = qp.parse(queryString);
/***** 2、模糊查询 *****/
Term term = new Term(field,queryString);
query = new FuzzyQuery(term);
}
catch(Exception e)
{
System.out.print(e);
}
if (searcher != null)
{
hits = searcher.search(query, 500);
if (hits.totalHits > 0)
{
System.out.println("找到" + hits.totalHits + " 个结果!");
}
}
ScoreDoc[] scoreDocs = hits.scoreDocs;
Document doc = null;
int i=0;
for( ScoreDoc sdoc : scoreDocs )
{
try{
int currIndex = sdoc.doc;
doc = searcher.doc( currIndex );
String content = doc.get(field);
System.out.println("结果"+(i+1)+ " title:"+doc.get("title"));
i=i+1;
}catch(Exception e){
e.printStackTrace();
}
}
}
}
检索结果如下:
找到12 个结果!
结果1 title:IEEE Xplore: Science, Measurement and Technology, IEE Proceedings A
结果2 title:Proceedings of the China-U.S. Forum on Science and Technology Policy
结果3 title:Proceedings of the China-U.S. Forum on Science and Technology Policy
结果4 title:Proceedings of the National Academy of Sciences of the United...
结果5 title:Numerical Simulations in the Environmental and Earth Sciences
结果6 title:Numerical Simulations in the Environmental and Earth Sciences
结果7 title:PROCEEDINGS OF THE CALIFORNIA ACADEMY OF SCIENCES, FOURTH SERIES
结果8 title:Proceedings of the International Con- ference on Alcoholism and ...
结果9 title:Proceedings of the International Con- ference on Alcoholism and ...
结果10 title:Gender Differentials in Judicial Proceedings: Field Evidence from...
结果11 title:MINUTES OF PROCEEDINGS PROCES-VERBAL D'AUDIENCE
结果12 title:MINUTES OF PROCEEDINGS PROCES-VERBAL D'AUDIENCE
可以看到前7条都正确,而后面5条就没有包括science这个单词。省略号是当初存进数据库里就有的。另外,索引和查询用的analyzer都是StandardAnalyzer。还有就是不用模糊检索,只是精确匹配检索的结果集是正确的。
麻烦给看看吧,谢谢了!
[解决办法]
我怀疑是搜索的时候没进行分词。但是从模糊检索的语句来看,没有用到分词的地方啊。不知道该怎么作,郁闷!