반응형
Recent Posts
Recent Comments
관리 메뉴

개발잡부

[es] Similarity module 본문

ElasticStack/Elasticsearch

[es] Similarity module

닉의네임 2022. 6. 24. 10:44
반응형

 

7.9

https://www.elastic.co/guide/en/elasticsearch/reference/7.9/index-modules-similarity.html

 

7.9환경으로 세팅하고 테스트

setting 에서 my_similarity 를 생성하고  mapping 에서  similarity를 my_similarity 로 맵핑

 

BM25 similarity (default)

TF/IDF based similarity that has built-in tf normalization and is supposed to work better for short fields (like names). See Okapi_BM25 for more details. This similarity has the following options:

k1 Controls non-linear term frequency normalization (saturation). The default value is 1.2.
b Controls to what degree document length normalizes tf values. The default value is 0.75.
discount_overlaps Determines whether overlap tokens (Tokens with 0 position increment) are ignored when computing norm. By default this is true, meaning overlap tokens do not count when computing norms.

Type name: BM25

 

 

DFR similarity

Similarity that implements the divergence from randomness framework. This similarity has the following options:

basic_model Possible values: g, if, in and ine.
after_effect Possible values: b and l.
normalization Possible values: no, h1, h2, h3 and z.

All options but the first option need a normalization value.

Type name: DFR

PUT /similarity-index
{
  "settings": {
    "number_of_replicas": 0, 
    "index": {
      "similarity": {
        "my_similarity": {
          "type": "DFR",
          "basic_model": "g",
          "after_effect": "l",
          "normalization": "h2",
          "normalization.h2.c": "3.0"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "my_field": {
        "type": "text",
        "similarity": "my_similarity"
      }
    }
  }
}

PUT /similarity-index2
{
  "settings": {
    "number_of_replicas": 0, 
    "index": {
      "similarity": {
        "my_similarity": {
          "type": "DFR",
          "basic_model": "g",
          "after_effect": "l",
          "normalization": "h2",
          "normalization.h2.c": "4.0"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "my_field": {
        "type": "text",
        "similarity": "my_similarity"
      }
    }
  }
}

데이터를 입력

PUT /similarity-index/_doc/1
{
  "my_field": "foo bar foo"
}


PUT /similarity-index2/_doc/1
{
  "my_field": "foo bar foo"
}

스코어 계산 확인

GET /similarity-index/_search?explain=true
{
  "query": {
    "term": {
      "my_field": {
        "value": "foo"
      }
    }
  }
}


GET /similarity-index2/_search?explain=true
{
  "query": {
    "term": {
      "my_field": {
        "value": "foo"
      }
    }
  }
}
반응형

'ElasticStack > Elasticsearch' 카테고리의 다른 글

[es] script similarity test  (0) 2022.06.26
[es] scripted similarity  (0) 2022.06.24
[es] Nested Query vs Object Query  (0) 2022.06.21
[es] nested query test  (0) 2022.06.21
[es] 스코어 계산 확인 - explain  (0) 2022.06.20
Comments