首页 诗词 字典 板报 句子 名言 友答 励志 学校 网站地图
当前位置: 首页 > 教程频道 > 开发语言 > 编程 >

文本形似度计算-Levenshtein

2012-09-13 
文本相似度计算-Levenshtein参见网址http://www.merriampark.com/ld.htm#JAVAimport java.util.BitSetpub

文本相似度计算-Levenshtein
参见网址http://www.merriampark.com/ld.htm#JAVA


import java.util.BitSet;public class Distance {public static void main(String[] args) {Distance distance = new Distance() ;int i = distance.LD("gttttl", "gambol") ;System.out.println(i);}// ****************************// Get minimum of three values// ****************************private int Minimum(int a, int b, int c) {int mi;mi = a;if (b < mi) {mi = b;}if (c < mi) {mi = c;}return mi;}// *****************************// Compute Levenshtein distance// *****************************public int LD(String s, String t) {//构建一个二维数据int d[][]; // matrix//s的长度int n; // length of s//t的长度int m; // length of t//s的偏移量int i; // iterates through s//t的偏移量int j; // iterates through t//s偏移量所在的charchar s_i; // ith character of s//t偏移量所在的charchar t_j; // jth character of t//临时变量对比差值int cost; // cost// Step 1n = s.length();m = t.length();//当n为0时.则变化为m所有的值if (n == 0) {return m;}//同上if (m == 0) {return n;}d = new int[n + 1][m + 1];// Step 2 将数组首行首列添加内容.为当前行号列号for (i = 0; i <= n; i++) {d[i][0] = i;}for (j = 0; j <= m; j++) {d[0][j] = j;}// Step 3for (i = 1; i <= n; i++) {s_i = s.charAt(i - 1);// Step 4//判断i位置的值和 t的每个字的差值for (j = 1; j <= m; j++) {t_j = t.charAt(j - 1);// Step 5if (s_i == t_j) {cost = 0;} else {cost = 1;}// Step 6//在数组的d[i][j] = Minimum(d[i - 1][j] + 1, d[i][j - 1] + 1,d[i - 1][j - 1] + cost);}}// Step 7//取得最右面最下面的值就是文本的想速度了return d[n][m];}}





都加注释了....不解释了..


This section shows how the Levenshtein distance is computed when the source string is "GUMBO" and the target string is "GAMBOL".
Steps 1 and 2
    G U M B O
  0 1 2 3 4 5
G 1          
A 2          
M 3          
B 4          
O 5          
L 6          

Steps 3 to 6 When i = 1
    G U M B O
  0 1 2 3 4 5
G 1 0        
A 2 1        
M 3 2        
B 4 3        
O 5 4        
L 6 5        

Steps 3 to 6 When i = 2
    G U M B O
  0 1 2 3 4 5
G 1 0 1      
A 2 1 1      
M 3 2 2      
B 4 3 3      
O 5 4 4      
L 6 5 5      

Steps 3 to 6 When i = 3
    G U M B O
  0 1 2 3 4 5
G 1 0 1 2    
A 2 1 1 2    
M 3 2 2 1    
B 4 3 3 2    
O 5 4 4 3    
L 6 5 5 4    

Steps 3 to 6 When i = 4
    G U M B O
  0 1 2 3 4 5
G 1 0 1 2 3  
A 2 1 1 2 3  
M 3 2 2 1 2  
B 4 3 3 2 1  
O 5 4 4 3 2  
L 6 5 5 4 3  

Steps 3 to 6 When i = 5
    G U M B O
  0 1 2 3 4 5
G 1 0 1 2 3 4
A 2 1 1 2 3 4
M 3 2 2 1 2 3
B 4 3 3 2 1 2
O 5 4 4 3 2 1
L 6 5 5 4 3 2

Step 7
The distance is in the lower right hand corner of the matrix, i.e. 2. This corresponds to our intuitive realization that "GUMBO" can be transformed into "GAMBOL" by substituting "A" for "U" and adding "L" (one substitution and 1 insertion = 2 changes).

热点排行