Github Code: https://github.com/usemodj/elasticsearch-analysis-korean

The Korean Analysis plugin integrates Lucene Korean analysis module into elasticsearch-5.0.0-alpha4.

Install plugin

git clone https://github.com/usemodj/elasticsearch-analysis-korean.git

sudo cp dist/analysis-korean /usr/share/elasticsearch/plugins/

And restart elasticsearch service:

sudo service elasticsearch restart
...
sudo service elasticsearch status

test elasticseach korean analysis

curl -XDELETE localhost:9200/test

curl -X PUT http://localhost:9200/test -d '{
  "settings": {
      "analysis": {
        "analyzer": {
          "kr_analyzer": {
            "type": "custom",
            "tokenizer": "kr_tokenizer",
            "filter": [ "trim", "kr_filter" ]
          }
        }
      }
  }
}'

curl -XGET 'localhost:9200/test/_analyze?analyzer=kr_analyzer&pretty=1' -d '아버지가 가방에 들어가셨다.'

Result:

{
  "tokens" : [ {
    "token" : "아버지가",
    "start_offset" : 0,
    "end_offset" : 4,
    "type" : "word",
    "position" : 0
  }, {
    "token" : "아버지",
    "start_offset" : 0,
    "end_offset" : 3,
    "type" : "word",
    "position" : 0
  }, {
    "token" : "가방에",
    "start_offset" : 5,
    "end_offset" : 8,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "가방",
    "start_offset" : 5,
    "end_offset" : 7,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "들어가셨다",
    "start_offset" : 9,
    "end_offset" : 14,
    "type" : "word",
    "position" : 2
  } ]
}

The plugin includes the kr_analyzer analyzer, kr_tokenizer tokenizer, and kr_filter token filter.

Leave a Reply