관리 메뉴

개발잡부

[es] 검색쿼리를 만들어 보자 본문

ElasticStack/Elasticsearch

[es] 검색쿼리를 만들어 보자

닉의네임 2022. 1. 15. 13:39
SMALL

900gle shopping 을 java 로 만들었으니..

Tensorflow text embedding 은 python API 통해서 vector를 받아오는 구조로..

아래와 같이 만들예정 우선 파이썬으로 테스트

 

 

#! The vector functions of the form function(query, doc['field']) are deprecated, and the form function(query, 'field') should be used instead. For example, cosineSimilarity(query, doc['field']) is replaced by cosineSimilarity(query, 'field').

위와 같다고 하니 

 

"source": "cosineSimilarity(params.query_vector, doc['name_vector']) + 1.0", 이부분을 

"source": "cosineSimilarity(params.query_vector, 'name_vector') + 1.0", 이렇게 수정

step 1

function score 안에  multi_match 쿼리랑 script score 쿼리를 넣고 상품명 백터값으로 score 추가

script_query = {
            "function_score": {
                "query": {
                    "multi_match": {
                        "query": query,
                        "fields": [
                            "name^5",
                            "category"
                        ]
                    }
                },
                "script_score": {
                    "script": {
                        "source": "cosineSimilarity(params.query_vector, 'name_vector') + 1.0",
                        "params": {
                            "query_vector": query_vector
                        }
                    }
                }
            }
        }

 

step 2

function score 안에  multi_match 쿼리,  functions 안에 script score 쿼리와 filter쿼리 를 넣고  가중치 추가

script_query = {
    "function_score": {
        "query": {
            "multi_match": {
                "query": query,
                "fields": [
                    "name^5",
                    "category"
                ]
            }
        },
        "functions": [
            {
                "script_score": {
                    "script": {
                        "source": "cosineSimilarity(params.query_vector, 'name_vector') + 1.0",
                        "params": {
                            "query_vector": query_vector
                        }
                    }
                },
                "weight": 50
            },
            {
                "filter": { "match": { "name": query } },
                "random_score": {},
                "weight": 23
            }
        ]
    }
}

 

테스트라 결과는 스래기 지만 스코어가 풍년인것을 보면 적용은 되고 있음 

id: hBEKWX4B3J2dY7S3R9N1, score: 28706.25
{'name': '구찌 GG 서류 가방 658543 97S4N 1000', 'category': '패션잡화 남성가방 브리프케이스'}

id: IREKWX4B3J2dY7S3NtDe, score: 22517.828
{'name': '프라다 미니 버킷백 복조리백 원단 복주머니 가방', 'category': '패션잡화 여성가방 숄더백'}

id: ZBEKWX4B3J2dY7S3juYw, score: 18378.627
{'name': '해외C어뉴 갤럭시 캐디백 스탠드 골프 가방 캐디백', 'category': '스포츠/레저 골프 골프백 캐디백'}

 

 

 

 

# -*- coding: utf-8 -*-

import time

from elasticsearch import Elasticsearch

import tensorflow_hub as hub
import tensorflow_text


##### SEARCHING #####

def run_query_loop():
    while True:
        try:
            handle_query()
        except KeyboardInterrupt:
            return


def handle_query():
    query = input("Enter query: ")

    embedding_start = time.time()
    query_vector = embed_text([query])[0]
    embedding_time = time.time() - embedding_start

    script_query = {
        "function_score": {
            "query": {
                "multi_match": {
                    "query": query,
                    "fields": [
                        "name^5",
                        "category"
                    ]
                }
            },
            "functions": [
                {
                    "script_score": {
                        "script": {
                            "source": "cosineSimilarity(params.query_vector, doc['name_vector']) + 1.0",
                            "params": {
                                "query_vector": query_vector
                            }
                        }
                    },
                    "weight": 50
                },
                {
                    "filter": { "match": { "name": query } },
                    "random_score": {},
                    "weight": 23
                }
            ]
        }
    }

    search_start = time.time()
    response = client.search(
        index=INDEX_NAME,
        body={
            "size": SEARCH_SIZE,
            "query": script_query,
            "_source": {"includes": ["name", "category"]}
        }
    )
    search_time = time.time() - search_start

    print()
    print("{} total hits.".format(response["hits"]["total"]["value"]))
    print("embedding time: {:.2f} ms".format(embedding_time * 1000))
    print("search time: {:.2f} ms".format(search_time * 1000))
    for hit in response["hits"]["hits"]:
        print("id: {}, score: {}".format(hit["_id"], hit["_score"]))
        print(hit["_source"])
        print()


##### EMBEDDING #####

def embed_text(input):
    vectors = model(input)
    return [vector.numpy().tolist() for vector in vectors]


##### MAIN SCRIPT #####

if __name__ == '__main__':
    INDEX_NAME = "products_a"

    SEARCH_SIZE = 3
    print("Downloading pre-trained embeddings from tensorflow hub...")
    model = hub.load("https://tfhub.dev/google/universal-sentence-encoder-multilingual-large/3")
    client = Elasticsearch(http_auth=('elastic', 'dlengus'))

    run_query_loop()

    print("Done.")
LIST

'ElasticStack > Elasticsearch' 카테고리의 다른 글

[es] 검색결과를 검증해보자  (0) 2022.01.21
[es] 검색쿼리에 랭킹을 적용해보자!  (0) 2022.01.20
[es] Bool Query  (0) 2022.01.10
[es] intervals query  (0) 2022.01.06
[es] Java High Level REST Client  (0) 2022.01.05
Comments