일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | |||
5 | 6 | 7 | 8 | 9 | 10 | 11 |
12 | 13 | 14 | 15 | 16 | 17 | 18 |
19 | 20 | 21 | 22 | 23 | 24 | 25 |
26 | 27 | 28 | 29 | 30 | 31 |
Tags
- query
- licence delete curl
- Kafka
- Python
- Elasticsearch
- 파이썬
- License
- Java
- ELASTIC
- aggs
- sort
- Test
- high level client
- docker
- analyzer test
- plugin
- aggregation
- API
- flask
- token filter test
- zip 파일 암호화
- Mac
- MySQL
- TensorFlow
- license delete
- springboot
- 차트
- matplotlib
- 900gle
- zip 암호화
Archives
- Today
- Total
개발잡부
[es] Similarity module 본문
반응형
7.9
https://www.elastic.co/guide/en/elasticsearch/reference/7.9/index-modules-similarity.html
7.9환경으로 세팅하고 테스트
setting 에서 my_similarity 를 생성하고 mapping 에서 similarity를 my_similarity 로 맵핑
BM25 similarity (default)
TF/IDF based similarity that has built-in tf normalization and is supposed to work better for short fields (like names). See Okapi_BM25 for more details. This similarity has the following options:
k1 | Controls non-linear term frequency normalization (saturation). The default value is 1.2. |
b | Controls to what degree document length normalizes tf values. The default value is 0.75. |
discount_overlaps | Determines whether overlap tokens (Tokens with 0 position increment) are ignored when computing norm. By default this is true, meaning overlap tokens do not count when computing norms. |
Type name: BM25
DFR similarity
Similarity that implements the divergence from randomness framework. This similarity has the following options:
basic_model | Possible values: g, if, in and ine. |
after_effect | Possible values: b and l. |
normalization | Possible values: no, h1, h2, h3 and z. |
All options but the first option need a normalization value.
Type name: DFR
PUT /similarity-index
{
"settings": {
"number_of_replicas": 0,
"index": {
"similarity": {
"my_similarity": {
"type": "DFR",
"basic_model": "g",
"after_effect": "l",
"normalization": "h2",
"normalization.h2.c": "3.0"
}
}
}
},
"mappings": {
"properties": {
"my_field": {
"type": "text",
"similarity": "my_similarity"
}
}
}
}
PUT /similarity-index2
{
"settings": {
"number_of_replicas": 0,
"index": {
"similarity": {
"my_similarity": {
"type": "DFR",
"basic_model": "g",
"after_effect": "l",
"normalization": "h2",
"normalization.h2.c": "4.0"
}
}
}
},
"mappings": {
"properties": {
"my_field": {
"type": "text",
"similarity": "my_similarity"
}
}
}
}
데이터를 입력
PUT /similarity-index/_doc/1
{
"my_field": "foo bar foo"
}
PUT /similarity-index2/_doc/1
{
"my_field": "foo bar foo"
}
스코어 계산 확인
GET /similarity-index/_search?explain=true
{
"query": {
"term": {
"my_field": {
"value": "foo"
}
}
}
}
GET /similarity-index2/_search?explain=true
{
"query": {
"term": {
"my_field": {
"value": "foo"
}
}
}
}
반응형
'ElasticStack > Elasticsearch' 카테고리의 다른 글
[es] script similarity test (0) | 2022.06.26 |
---|---|
[es] scripted similarity (0) | 2022.06.24 |
[es] Nested Query vs Object Query (0) | 2022.06.21 |
[es] nested query test (0) | 2022.06.21 |
[es] 스코어 계산 확인 - explain (0) | 2022.06.20 |
Comments