lucene 建立资料索引和针对索引进行搜索（lucene2.2版本）

2012-10-26

lucene 建立文件索引和针对索引进行搜索（lucene2.2版本）??? 最近因为项目需要，开始了解lucene的应用，手头

lucene 建立文件索引和针对索引进行搜索（lucene2.2版本）

??? 最近因为项目需要，开始了解lucene的应用，手头有一本《Lucene In Action》，不过一用起来才发现，我现在用2.0lucene包的情况下，该书第一个示例就无法正确编译通过，找了一些资料，终于算是调试通过，算是一个好的开始吧。

??? 1.建立索引：

?????

package demo.example.searcher;import java.io.*;import java.util.*;import org.apache.lucene.analysis.standard.*;import org.apache.lucene.index.*;import org.apache.lucene.document.*;import org.apache.commons.logging.Log;import org.apache.commons.logging.LogFactory;public class Indexer {private static Log log = LogFactory.getLog(Indexer.class);public static void main(String[] args) throws Exception {File indexDir = new File("C:\\index");File dataDir = new File("C:\\lucene\\src");long start = new Date().getTime();int numIndexed = index(indexDir, dataDir);long end = new Date().getTime();System.out.println("use:" + (end - start));}public static int index(File indexDir, File dataDir) {int ret = 0;try {IndexWriter writer = new IndexWriter(indexDir, new StandardAnalyzer(), true);writer.setUseCompoundFile(false);indexDirectory(writer, dataDir);ret = writer.docCount();writer.optimize();writer.close();} catch (Exception e) {e.printStackTrace();}return ret;}public static void indexDirectory(IndexWriter writer, File dir) {try {File[] files = dir.listFiles();for (File f : files) {if (f.isDirectory()) {indexDirectory(writer, f);} else {indexFile(writer, f);}}} catch (Exception e) {e.printStackTrace();}}public static void indexFile(IndexWriter writer, File f) {try {System.out.println("Indexing:" + f.getCanonicalPath());Document doc = new Document();Reader txtReader = new FileReader(f);doc.add(new Field("contents", txtReader));doc.add(new Field("filename", f.getCanonicalPath(), Field.Store.YES, Field.Index.UN_TOKENIZED));writer.addDocument(doc);} catch (Exception e) {e.printStackTrace();}}}

??? 2.针对上面类建立的索引进行查询：

package demo.example.searcher;import java.util.*;import org.apache.lucene.search.*;import org.apache.lucene.queryParser.*;import org.apache.lucene.analysis.standard.*;import org.apache.lucene.document.*;import org.apache.commons.logging.Log;import org.apache.commons.logging.LogFactory;public class Searcher {private static Log log = LogFactory.getLog(Searcher.class);public static void main(String[] args) {String indexDir = "C:\\index";String q = "查询关键字";search(indexDir, q);}public static void search(String indexDir, String q) {try {IndexSearcher is = new IndexSearcher(indexDir);QueryParser queryParser = new QueryParser("contents", new StandardAnalyzer());Query query = queryParser.parse(q);long start = new Date().getTime();Hits hits = is.search(query);long end = new Date().getTime();System.out.println("use:" + (end - start));for (int i = 0; i < hits.length(); i++) {Document doc = hits.doc(i);System.out.println("The right file:" + doc.get("filename"));}} catch (Exception e) {e.printStackTrace();}}}

最后运行正常。

不过在运行测试的时候发现了一个不明白的问题：

在建立索引的文件都是Java类，在测试查询关键字信息的时候，中英文都很正常，但发现在java类源文件中的信息被过滤了，无法检索出来，这是怎么回事啊，lucene自动过滤类文件的注释信息么？

热点排行

其他相关

lucene 建立资料索引和针对索引进行搜索（lucene2.2版本）