반응형
Recent Posts
Recent Comments
관리 메뉴

개발잡부

[es] 쿼리를 확인해보자 본문

ElasticStack/Elasticsearch

[es] 쿼리를 확인해보자

닉의네임 2022. 1. 30. 02:02
반응형

https://ldh-6019.tistory.com/213

 

[es] 검색쿼리를 만들어 보자 3

must : 쿼리가 참인 도큐먼트들을 검색 must_not : 쿼리가 거짓인 도큐먼트들을 검색 should : 검색 결과 중 이 쿼리에 해당하는 도큐먼트의 점수를 높임 filter : 쿼리가 참인 도큐먼트를 검색하지만 스

ldh-6019.tistory.com

여기서 만든 이 구조를 

돌려보자

 

# -*- coding: utf-8 -*-

import time
import math

from elasticsearch import Elasticsearch

import tensorflow_hub as hub
import tensorflow_text

import matplotlib.pyplot as plt
import numpy as np

##### SEARCHING #####

def get_result(script_query):
    response = client.search(
        index=INDEX_NAME,
        body={
            "size": SEARCH_SIZE,
            "query": script_query,
            "_source": {"includes": ["name", "category"]}
        }
    )

    return [hit["_score"] for hit in response["hits"]["hits"]]

def handle_query():
    query = "나이키 남성 신발"
    query_vector = embed_text([query])[0]

    script_query1 = {
        "function_score": {
            "query": {
                "bool": {
                    "must" : [
                        {
                            "multi_match": {
                                "query": query,
                                "fields": [
                                    "name",
                                    "category"
                                ]
                            }
                        }],
                    "should":[
                        {
                            "multi_match": {
                                "query": query,
                                "fields": [
                                    "category1",
                                    "category2",
                                    "category3",
                                    "category4",
                                    "category5"
                                ]
                            }
                        }
                    ]
                }
            }
            ,
            "functions": [
                {
                    "script_score": {
                        "script": {
                            "source": "cosineSimilarity(params.query_vector, 'feature_vector') * doc['weight'].value * doc['popular'].value / doc['name.keyword'].length + doc['category.keyword'].length",
                            "params": {
                                "query_vector": query_vector
                            }
                        }
                    },
                    "weight": 0.1
                }
            ]
        }
    }

    script_query2 = {
        "function_score": {
            "query": {
                "bool": {
                    "must" : [
                        {
                            "multi_match": {
                                "query": query,
                                "fields": [
                                    "name",
                                    "category"
                                ]
                            }
                        }],
                    "should":[
                        {
                            "multi_match": {
                                "query": query,
                                "fields": [
                                    "category1",
                                    "category2",
                                    "category3",
                                    "category4",
                                    "category5"
                                ]
                            }
                        }
                    ]
                }
            }
            ,
            "functions": [
                {
                    "script_score": {
                        "script": {
                            "source": "cosineSimilarity(params.query_vector, 'feature_vector') * doc['weight'].value * doc['popular'].value / doc['name.keyword'].length + doc['category.keyword'].length",
                            "params": {
                                "query_vector": query_vector
                            }
                        }
                    },
                    "weight": 0.1
                }
            ]
        }
    }

    script_query3 = {
        "function_score": {
            "query": {
                "bool": {
                    "must" : [
                        {
                            "multi_match": {
                                "query": query,
                                "fields": [
                                    "name",
                                    "category"
                                ]
                            }
                        }],
                    "should":[
                        {
                            "multi_match": {
                                "query": query,
                                "fields": [
                                    "category1",
                                    "category2",
                                    "category3",
                                    "category4",
                                    "category5"
                                ]
                            }
                        }
                    ]
                }
            }
            ,
            "functions": [
                {
                    "script_score": {
                        "script": {
                            "source": "cosineSimilarity(params.query_vector, 'feature_vector') * doc['weight'].value * doc['popular'].value / doc['name.keyword'].length + doc['category.keyword'].length",
                            "params": {
                                "query_vector": query_vector
                            }
                        }
                    },
                    "weight": 0.1
                }
            ]
        }
    }

    # print(response["hits"]["max_score"])
    x = np.arange(0, SEARCH_SIZE, 1)
    y1 = get_result(script_query1)
    y2 = get_result(script_query2)
    y3 = get_result(script_query3)

    plt.xlim([1, SEARCH_SIZE])      # X축의 범위: [xmin, xmax]
    plt.ylim([0, MAX_SCORE])     # Y축의 범위: [ymin, ymax]
    plt.xlabel('top 10', labelpad=2)
    plt.ylabel('score', labelpad=2)
    plt.plot(x, y1, label='query1', color='#e35f62', marker='*', linewidth=1 )
    plt.plot(x, y2, label='query2', color='#008000', marker='*', linewidth=1 )
    plt.plot(x, y3, label='query3', color='#3333cc', marker='*', linewidth=1 )

    plt.legend()
    plt.title('Query score')
    plt.xticks(x)
    plt.yticks(np.arange(1, MAX_SCORE))
    plt.grid(True)
    plt.show()

##### EMBEDDING #####

def embed_text(input):
    vectors = model(input)
    return [vector.numpy().tolist() for vector in vectors]


##### MAIN SCRIPT #####

if __name__ == '__main__':
    INDEX_NAME = "goods"

    SEARCH_SIZE = 10
    MAX_SCORE = 3
    print("Downloading pre-trained embeddings from tensorflow hub...")
    model = hub.load("https://tfhub.dev/google/universal-sentence-encoder-multilingual-large/3")
    client = Elasticsearch(http_auth=('elastic', 'dlengus'))

    handle_query()

    print("Done.")



    # https://matplotlib.org/stable/gallery/pyplots/axline.html#sphx-glr-gallery-pyplots-axline-py
반응형

'ElasticStack > Elasticsearch' 카테고리의 다른 글

[es] payload score  (0) 2022.02.20
[es] 자동완성 3  (0) 2022.02.06
[es] 검색쿼리를 만들어 보자 3  (0) 2022.01.30
[es] 검색쿼리를 만들어 보자 2  (0) 2022.01.29
[es] 검색결과 비교 - score  (0) 2022.01.28
Comments