搜索引擎–elasticsearch python客户端pyes 建立索引跟搜索

2013-11-08

搜索引擎–elasticsearch python客户端pyes 建立索引和搜索主机环境:Ubuntu 13.04Python版本：2.7.4转载请标

搜索引擎–elasticsearch python客户端pyes 建立索引和搜索

主机环境:Ubuntu 13.04

Python版本：2.7.4转载请标明：http://blog.yanming8.cn/archives/118 官方站点：http://www.elasticsearch.com/

中文站点：http://es-cn.medcl.net/

下面一段介绍引用自中文站点：

好吧，假如你建了一个web站点或者是一个应用程序，你就可能会需要添加搜索功能（因为这太有必要了），而事实上让搜索跑起来是有难度的，我们不仅想要搜索的速度快，而且还要安装方便（最好是无痛安装），另外模式定义要非常自由（schema free），可以通过HTTP以JSON格式的数据来进行索引，服务器必须是一直可用的（HA高可用，这个不能丢），从一台机器能够扩展到成千上万台，然后搜索必须是实时的（real-time），使用起来一定要简单、支持多租户，我们需要一整套的解决方案，并且是为云构建的。
“让搜索更简单”，这是我们的宣言，“并且要酷，像盆景一样”
elasticsearch 的目标是解决上面的所有问题以及更多。她是开源的（Apache2协议），分布式的，RESTful的，构建在Apache Lucene之上的的搜索引擎.

1 、分布式服务器的安装：首先下载http://www.elasticsearch.org/download/，选择合适的版本安装，这里直接下载了适合ubuntu的DEB包，下载完成后直接dpkg命令安装。安装完成后可以通过sudo service elasticsearch start来启动服务。2、安装pyes客户端使用命令1pip install pyes安装elasticsearch的python的组件。3、安装pyes的中文分词组件直接下载https://github.com/medcl/elasticsearch-rtf/blob/master/elasticsearch/plugins/analysis-ik/elasticsearch-analysis-ik-1.2.2.jar中文分词组件然后移动的elasticsearch的安装目录/usr/share/elasticsearch/analysis-ik/,修改配置文件/etc/elasticsearch/elasticsearch.yml设置插件的路径path.plugins: /usr/share/elasticsearch/plugins并添加分词组建配置1index:2 analysis:3 analyzer:4 ik:5 alias: [ik_analyzer]6 type: org.elasticsearch.index.analysis.IkAnalyzerProvider最后下载IK分词使用的词典

cd /etc/elasticsearch
wget http://github.com/downloads/medcl/elasticsearch-analysis-ik/ik.zip –no-check-certificate
unzip ik.zip
rm ik.zip重启elasticsearch服务即可。4、建立索引01#!/usr/bin/env python02#-*- coding:utf-8-*-03import os04import sys05from pyes import *06
07INDEX_NAME='txtfiles'08
09class IndexFiles(object):10    def __init__(self,root):11        conn = ES('127.0.0.1:9200', timeout=3.5)#连接ES12        try:13            conn.delete_index(INDEX_NAME)14            #pass15        except:16            pass17        conn.create_index(INDEX_NAME)#新建一个索引18
19        #定义索引存储结构20        mapping = {u'content': {'boost': 1.0,21                          'index': 'analyzed',22                          'store': 'yes',23                          'type': u'string',24                          "indexAnalyzer":"ik",25                          "searchAnalyzer":"ik",26                          "term_vector" : "with_positions_offsets"},27                  u'name': {'boost': 1.0,28                             'index': 'analyzed',29                             'store': 'yes',30                             'type': u'string',31                             "indexAnalyzer":"ik",32                             "searchAnalyzer":"ik",33                             "term_vector" : "with_positions_offsets"},34                  u'dirpath': {'boost': 1.0,35                             'index': 'analyzed',36                             'store': 'yes',37                             'type': u'string',38                             "indexAnalyzer":"ik",39                             "searchAnalyzer":"ik",40                             "term_vector" : "with_positions_offsets"},41        }42
43        conn.put_mapping("test-type", {'properties':mapping}, [INDEX_NAME])#定义test-type44
45        self.addIndex(conn,root)46
47        conn.default_indices=[INDEX_NAME]#设置默认的索引48        conn.refresh()#刷新以获得最新插入的文档49
50    def addIndex(self,conn,root):51        print root52        for root, dirnames, filenames in os.walk(root):53            for filename in filenames:54                if not filename.endswith('.txt'):55                    continue56                print "Indexing file ", filename57                try:58                    path=os.path.join(root,filename)59                    file=open(path)60                    contents = unicode(file.read(),'utf-8')61                    file.close()62                    if len(contents) > 0:63                        conn.index({'name':filename, 'dirpath':root, 'content':contents},INDEX_NAME,'test-type')64                    else:65                        print 'no contents in file %s',path66                except Exception,e:67                    print e68
69if __name__ == '__main__':70    IndexFiles('./txtfiles')

5、搜索并高亮显示view source01#!/usr/bin/env python02#-*- coding:utf-8 -*-03
04import os05import sys06from pyes import *07
08conn = ES('127.0.0.1:9200', timeout=3.5)#连接ES09sq=StringQuery(u'世界末日','content')10h=HighLighter(['<b>'], ['</b>'], fragment_size=20)11
12s=Search(sq,highlight=h)13s.add_highlight("content")14results=conn.search(s,indices='txtfiles',doc_types='test-type')15
16list=[]17for r in results:18    if(r._meta.highlight.has_key("content")):19        r['content']=r._meta.highlight[u"content"][0]20    list.append(r)21    print r['content']22print len(list)

热点排行

网络基础

搜索引擎–elasticsearch python客户端pyes 建立索引跟搜索