solr suggester 组件使用
The begining
There is one thing that you must know – Suggest component is not available in Solr version 1.4.1 and below. To start using this component you need to download 3_x or trunk version from Lucene/Solr SVN.
Configuration
Before we get into the index configuration we need to define an search component. So let’s do it:
view sourceprint?
1.
<searchComponent name="suggest" type="string" indexed="true" stored="true" multiValued="false" required="true"/>
2.
<field name="name" type="text" indexed="true" stored="true" multiValued="false" />
3.
<field name="name_autocomplete" type="text_auto" indexed="true" stored="true" multiValued="false" />
4.
<field name="description" type="text" indexed="true" stored="true" multiValued="false" />
In addition, there is the following copy field definition:
view sourceprint?
1.
<copyField source="name" dest="name_autocomplete" />
Suggesting single words
In order to achieve individual words suggestions text_autocomplete type should be defined as follows:
view sourceprint?
1.
<fieldType name="text_auto" positionIncrementGap="100">
2.
<analyzer>
3.
<tokenizer generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
5.
<filter name="text_auto">
2.
<analyzer>
3.
<tokenizer encoding="UTF-8"?>
02.
<response>
03.
<lst name="responseHeader">
04.
<int name="status">0</int>
05.
<int name="QTime">0</int>
06.
</lst>
07.
<lst name="spellcheck">
08.
<lst name="suggestions">
09.
<lst name="dys">
10.
<int name="numFound">4</int>
11.
<int name="startOffset">0</int>
12.
<int name="endOffset">3</int>
13.
<arr name="suggestion">
14.
<str>hard drive</str>
15.
<str>hard drive samsung</str>
16.
<str>hard drive seagate</str>
17.
<str>hard drive toshiba</str>
18.
</arr>
19.
</lst>
20.
</lst>
21.
</lst>
22.
</response>
The end
In the next part of the autocomplete functionality I’ll show how to modify its configuration to use static dictionary into the mechanism and how this can helk you get better suggestions. The last part of the series will be a performance comparison of each method in which I’ll try to diagnose which method is the fastest one in various situations.