使用Elasticsearch作为本博客搜索引擎，强大

首先需要说明以下，elasticsearch是一款基于Apache Lucene™的开源成熟的分布式的搜索引擎，无论在开源还是专有领域，Lucene可以被认为是迄今为止最先进、性能最好的、功能最全的搜索引擎库。。有许许多多的公司使用，大名顶顶的github也在使用。该搜索引擎还被很多公司用于日志的聚合统计，总之该引擎功能灰常强大。我呢，也是接触该引擎不久，当初只是为了实现这个博客的搜索系统。因此，欢迎指出其中的错误和不足。

Elasticsearch 的搭建

elasticsearch是以RESTful API的方式工作的。我们只要启动一个引擎，后面通过api进行索引的添加、删除、修改、查询等操作。

它的安装方式：

二进制文件直接运行
docker运行

二进制运行

下载并解压官方包Elasticsearch。
unix系统运行bin/elasticsearch，windows运行bin\elasticsearch.bat。

docker运行

这里是elasticsearch官方镜像地址。通过执行:

$ ocker pull docker pull elasticsearch      # pull 镜像
$ docker run -d elasticsearch               # 简单的启动

以下是我的启动命令，这里将端口映射出来，将配置文件、数据文件、日志文件和插件都映射出来，方便后面修改：

docker run -d \
-p 9200:9200 \
-v /data/elasticsearch/config:/usr/share/elasticsearch/config \
-v /data/elasticsearch/data:/usr/share/elasticsearch/data \
-v /data/elasticsearch/logs:/usr/share/elasticsearch/logs \
-v /data/elasticsearch/plugins:/usr/share/elasticsearch/plugins \
-e ES_JAVA_OPTS="-Xms512m -Xmx512m" \
--name es elasticsearch:2.4

查看是否启动成功：

$ curl -XGET http://127.0.0.1:9200/?pretty

{
  "name" : "Harness",
  "cluster_name" : "elasticsearch",
  "version" : {
    "number" : "2.4.0",
    "build_hash" : "ce9f0c7394dee074091dd1bc4e9469251181fc55",
    "build_timestamp" : "2016-08-29T09:14:17Z",
    "build_snapshot" : false,
    "lucene_version" : "5.5.2"
  },
  "tagline" : "You Know, for Search"
}

Elasticsearch 的配置

初看elasticsearch的配置是蛮复杂的，不过还真是比较复杂。你可以查看本博客Eiblog的conf/es下的目录文件。

├── config                      # 配置文件目录
│   ├── analysis
│   │   └── synonym.txt         # 同义词配置
│   ├── elasticsearch.yml       # 主配置文件
│   ├── logging.yml             # 日志配置文件
│   └── scripts                 # 脚本目录
└── plugins                     # 插件目录
    └── ik1.10.0                # ik中文分词插件
        ├── commons-codec-1.9.jar
        ├── commons-logging-1.2.jar
        ├── config
        ├── elasticsearch-analysis-ik-1.10.0.jar
        ├── httpclient-4.5.2.jar
        ├── httpcore-4.4.4.jar
        └── plugin-descriptor.properties

下面我们将对这些文件一一解析，如果你不愿理解，也可以直接将这些文件覆盖到elasticsearch的相应目录。

1、synonym.txt该文件是elasticsearch.yml中配置的同义词文件

ua,user-agent,userAgent
js,javascript
谷歌=>google

2、elasticsearch.yml该文件的内容主要是ik分词器的配置，详细信息可以到elasticsearch-analysis-ik查看。ik分词器有两个Tokenizer：ik_smart , ik_max_word。

ik_max_word: 会将文本做最细粒度的拆分，比如会将“中华人民共和国国歌”拆分为“中华人民共和国,中华人民,中华,华人,人民共和国,人民,人,民,共和国,共和,和,国国,国歌”，会穷尽各种可能的组合；
ik_smart: 会做最粗粒度的拆分，比如会将“中华人民共和国国歌”拆分为“中华人民共和国,国歌”。

ik的安装：ik-release下载对应版本，解压到plugins/ik下。重启elasticsearch，你会看到...plugins [analysis-ik]...。

Eiblog的elasticsearch.yml文件内容如下：

network.host: 0.0.0.0

index:
  analysis:
    analyzer:
      ik_syno:                          # 新建分析器
          type: custom
          tokenizer: ik_max_word
          filter: [my_synonym_filter]
      ik_syno_smart:                    # 分词器
          type: custom
          tokenizer: ik_smart
          filter: [my_synonym_filter]
    filter:
      my_synonym_filter:                # 同义词
          type: synonym
          synonyms_path: analysis/synonym.txt

3、logging.yml文件没变，直接拷贝的原文件。

4、scripts暂时没用到，但必需存在

5、ik1.10.0，这是ik插件，直接解压拷贝到这里。

Elasticsearch 的使用

前面说过，elasticsearch是通过RESTful API进行交互的。你可以用现成的go语言库，如：olivere/elastic，或直接调用elasticsearch api。

下面说一说elasticsearch的几个概念。

index -> 索引
type -> 类型
token -> 表征
filter -> 过滤器
analyser -> 分析器

它类似sql数据库中的对应关系啊如下：

MySQL	Elasticsearch
Database	Index
Table	Type
Row	Document
Column	Field
Schema	Mappping
Index	Everything Indexed by default
SQL	Query DSL

创建index,mapping

PUT /eiblog
{
    "mappings": {
        "article": {
            "properties": {
                "content": {
                    "analyzer": "ik_syno",
                    "search_analyzer": "ik_syno",
                    "term_vector": "with_positions_offsets",
                    "type": "string"
                },
                "date": {
                    "index": "not_analyzed",
                    "type": "date"
                },
                "slug": {
                    "type": "string"
                },
                "tag": {
                    "index": "not_analyzed",
                    "type": "string"
                },
                "title": {
                    "analyzer": "ik_syno",
                    "search_analyzer": "ik_syno",
                    "term_vector": "with_positions_offsets",
                    "type": "string"
                }
            }
        }
    }
}

上面，我们创建了一个名为eiblog的index，名为article的mapping。其中我们指定了title和content的analyzer为ik_syno，这样它们就可以使用ik来进行分词。而date、slug、tag使用默认分词，同时date和tag不会被分词。

添加索引数据

PUT /eiblog/article/1
{
    "title": "你好，世界",
    "slug": "hello",
    "tags": [
        "hello",
        "world"
    ],
    "content": "你好，世界。hello world.",
    "date": "2016-11-03T13:05:55Z"
}

这里就添加了一篇标题为你好，世界并且id为1的文章。使用PUT会在没有该文章索引的时候添加或有该文章索引的时候更新索引。

查找索引

通过url带参数进行简单查找
通过DSL语句查找

1、简单查找

GET /eiblog/article/_search?q=hello

{
  "took": 42,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.11626227,
    "hits": [
      {
        "_index": "eiblog",
        "_type": "article",
        "_id": "1",
        "_score": 0.11626227,
        "_source": {
          "content": "\n你好，世界。hello world.\n",
          "date": "2016-11-03T22:15:15.685673338+08:00",
          "img": "",
          "slug": "hello",
          "tag": [
            "hello",
            "world"
          ],
          "title": "你好，世界"
        }
      }
    ]
  }
}

ok，我们搜索出来的结果都在hits内。

2、DSL 查找

POST /eiblog/article/_search?size=1&from=0
{
    "highlight": {
        "fields": {
            "content": {},
            "title": {}
        },
        "post_tags": [
            "</b>"
        ],
        "pre_tags": [
            "<b>"
        ]
    },
    "query": {
        "dis_max": {
            "queries": [
                {
                    "match": {
                        "title": {
                            "boost": 4,
                            "minimum_should_match": "50%",
                            "query": "hello"
                        }
                    }
                },
                {
                    "match": {
                        "content": {
                            "boost": 4,
                            "minimum_should_match": "75%",
                            "query": "hello"
                        }
                    }
                },
                {
                    "match": {
                        "tag": {
                            "boost": 2,
                            "minimum_should_match": "100%",
                            "query": "hello"
                        }
                    }
                },
                {
                    "match": {
                        "slug": {
                            "boost": 1,
                            "minimum_should_match": "100%",
                            "query": "hello"
                        }
                    }
                }
            ],
            "tie_breaker": 0.3
        }
    }
}

我们根据一定的规则在所有字段中查询hello关键词并添加了highlight高亮显示。

{
  "took": 103,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.066132784,
    "hits": [
      {
        "_index": "eiblog",
        "_type": "article",
        "_id": "12",
        "_score": 0.066132784,
        "_source": {
          "content": "\n你好，世界。hello world.\n",
          "date": "2016-11-03T22:15:15.685673338+08:00",
          "img": "",
          "slug": "hello",
          "tag": [
            "hello",
            "world"
          ],
          "title": "你好，世界"
        },
        "highlight": {
          "content": [
            "\n你好，世界。<b>hello</b> world.\n"
          ]
        }
      }
    ]
  }
}

同时我们可以添加filter来实现多个字段同时查找，将filter放到于highlight和query同一级。eiblog使用filter例子如下：

"filter": {
        "bool": {
            "must": [
                {
                    "term": {
                        "title": "你好"
                    }
                },
                {
                    "term": {
                        "tag": "world"
                    }
                },
                {
                    "range": {
                        "date": {
                            "gte": "2016-11",
                            "lte": "2016-11||/M",
                            "format": "yyyy-MM-dd||yyyy-MM||yyyy"
                        }
                    }
                }
            ]
        }
    }

搜索title=你好并且tag=world并且date是2016-11月份的文章 range query。另外推荐一本书Elasticsearch 权威指南（中文版），写得非常的详细，可以快速入门。

本文链接：https://deepzz.com/post/elasticsearch.html，参与评论 »

--EOF--

发表于 2016-11-03 21:27:00，并被添加「elasticsearch、eiblog」标签。

本站使用「署名 4.0 国际」创作共享协议，转载请注明作者及原网址。更多说明 »

提醒：本文最后更新于 3281 天前，文中所描述的信息可能已发生改变，请谨慎使用。

Deepzz

使用Elasticsearch作为本博客搜索引擎，强大

Elasticsearch 的搭建

Elasticsearch 的配置

Elasticsearch 的使用

Comments