Lingpipe中的spell模块-搜索建议
搜索建议
Lingpipe提供了一种可选择的拼写纠错方式,能对用户未输入完整的查询进行提示最相近的查询建议。
图片中显示了google搜索框中的选择性纠错模型对查询进行补充完整。
例如,首行搜索建议词是”amzon”,尽管用户输入查询”anaz”,这并不惊奇,因为那些以”anaz”为前缀的建议词的搜索结果比较小。
不仅有词的搜索建议,还有短语纠错建议。比如一些搜索词像”anazao salon”
搜索建议和拼写纠错之间的一个重要不同点为:搜索建议是在确定的搜索短语集中选择的。
I want to find anaz,是没有建议的短语的
找出有数量的短语
例如我们的demo,我们假设用户提供一批包含数量的短语。例如我们提供一批美国各州名及对应人口数。文件格式如下示例:
Alabama 4599000Alaska 670000Arizona 6166000Arkansas 2811000California 36458000Colorado 4753000Connecticut 3505000Delaware 853000District of Columbia 58200Florida 18090000...
> ant complete-cmd|N| -3.95 New York -5.08 North Carolina -5.10 New Jersey -6.90 Nevada -7.26 New Mexico|New| -3.95 New York -5.10 New Jersey -7.26 New Mexico -7.83 New Hampshire-16.90 Nevada|New Y| -3.95 New York-15.10 New Jersey-17.26 New Mexico-17.83 New Hampshire|New Yor| -3.95 New York|Mew |-13.95 New York-15.10 New Jersey-17.26 New Mexico-17.83 New Hampshire|U| -6.87 Utah-13.04 California-13.67 Texas-13.95 New York-14.05 Florida|Uta| -6.87 Utah-23.04 California|ZebraFish|
public static void main(String[] args) throws IOException { File wordsFile = new File(args[0]); String[] lines = FileLineReader.readLineArray(wordsFile,"ISO-8859-1"); Map<String,Float> counter = new HashMap<String,Float>(200000); for (String line : lines) { int i = line.lastIndexOf(' '); if (i < 0) continue; String phrase = line.substring(0,i); String countString = line.substring(i+1); Float count = Float.valueOf(countString); counter.put(phrase,count); }. double matchWeight = 0.0; double insertWeight = -10.0; double substituteWeight = -10.0; double deleteWeight = -10.0; double transposeWeight = Double.NEGATIVE_INFINITY; FixedWeightEditDistance editDistance = new FixedWeightEditDistance(matchWeight, deleteWeight, insertWeight, substituteWeight, substituteWeight, transposeWeight);
int maxResults = 5; int maxQueueSize = 10000; double minScore = -25.0; AutoCompleter completer = new AutoCompleter(counter, editDistance, maxResults, maxQueueSize, minScore);
for (int i = 1; i < args.length; ++i) { SortedSet<ScoredObject<String>> completions = completer.complete(args[i]);System.out.println("\n|" + args[i] + "|"); for (ScoredObject<String> so : completions) System.out.printf("%6.2f %s\n", so.score(), so.getObject()); }