[es] 검색결과를 검증해보자

Notice

Recent Posts

Recent Comments

Link

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Tags more

Archives

Today

Total

관리 메뉴

개발잡부

[es] 검색결과를 검증해보자 본문

ElasticStack/Elasticsearch

[es] 검색결과를 검증해보자

닉의네임 2022. 1. 21. 11:29

True Positive(TP) : 실제 True인 정답을 True라고 예측 (정답)
False Positive(FP) : 실제 False인 정답을 True라고 예측 (오답)
False Negative(FN) : 실제 True인 정답을 False라고 예측 (오답)
True Negative(TN) : 실제 False인 정답을 False라고 예측 (정답)

정밀도 (Precision)

검색결과로 가져온 문서 중 실제 관련된 문서의 비율

재현율 (Recall)

관련된 문서중 검색된 문서의 비율

성능평가 알고리즘

nDCG

CG = 추천결과들은 동일한 비중으로 계산
DCG = 랭킨순서에따라 비중을 줄여 관련도를 계산
nDCG = 전체데이터에 대한 best DCG 를 계산

require.txt 에 pandas 추가

elasticsearch
numpy
tensorflow
tensorflow-hub
tensorflow_text
kss
regex
flask
flask_restful
Api
Resource
matplotlib
pandas

query_test.py

# -*- coding: utf-8 -*-

import time
import math

from elasticsearch import Elasticsearch

import tensorflow_hub as hub
import tensorflow_text

import matplotlib.pyplot as plt
import numpy as np

##### SEARCHING #####

def handle_query():
    query = "나이키 남성 신발"
    embedding_start = time.time()
    query_vector = embed_text([query])[0]
    embedding_time = time.time() - embedding_start

    script_query = {
        "function_score": {
            "query": {
                "multi_match": {
                    "query": query,
                    "fields": [
                        "name",
                        "category^2"
                    ]
                }
            },
            "functions": [
                {
                    "script_score": {
                        "script": {
                            "source": "cosineSimilarity(params.query_vector, 'feature_vector') * doc['weight'].value * doc['populr'].value / doc['name'].length + doc['category'].length",
                            "params": {
                                "query_vector": query_vector
                            }
                        }
                    },
                    "weight": 0.1
                }
            ]
        }
    }

    search_start = time.time()
    response = client.search(
        index=INDEX_NAME,
        body={
            "size": SEARCH_SIZE,
            "query": script_query,
            "_source": {"includes": ["name", "category"]}
        }
    )
    search_time = time.time() - search_start

    print()
    print("{} total hits.".format(response["hits"]["total"]["value"]))
    print("embedding time: {:.2f} ms".format(embedding_time * 1000))
    print("search time: {:.2f} ms".format(search_time * 1000))


    for hit in response["hits"]["hits"]:
        print("id: {}, score: {}".format(hit["_id"], hit["_score"]))
        print(hit["_source"])
        print()

    # print(response["hits"]["max_score"])
    x = np.arange(0, SEARCH_SIZE, 1)
    y = [hit["_score"] for hit in response["hits"]["hits"]]

    plt.xlim([1, SEARCH_SIZE])      # X축의 범위: [xmin, xmax]
    plt.ylim([0, math.ceil(response["hits"]["max_score"])])     # Y축의 범위: [ymin, ymax]
    plt.xlabel('top 10', labelpad=2)
    plt.ylabel('score', labelpad=2)
    plt.plot(x, y, label='query1', color='#e35f62', marker='*', linewidth=1 )
    plt.legend()
    plt.title('Query score')
    plt.xticks(x)
    plt.yticks(np.arange(1, math.ceil(response["hits"]["max_score"])))
    plt.grid(True)
    plt.show()

##### EMBEDDING #####

def embed_text(input):
    vectors = model(input)
    return [vector.numpy().tolist() for vector in vectors]


##### MAIN SCRIPT #####

if __name__ == '__main__':
    INDEX_NAME = "products_r"

    SEARCH_SIZE = 10
    print("Downloading pre-trained embeddings from tensorflow hub...")
    model = hub.load("https://tfhub.dev/google/universal-sentence-encoder-multilingual-large/3")
    client = Elasticsearch(http_auth=('elastic', 'dlengus'))

    handle_query()

    print("Done.")

음.. 만들고 보니 또 쓰잘때기 없는걸 만들었네.. 스코어를 비교해서 뭐해..

'ElasticStack > Elasticsearch' 카테고리의 다른 글

[es] 검색쿼리를 만들어 보자 2 (0)	2022.01.29
[es] 검색결과 비교 - score (0)	2022.01.28
[es] 검색쿼리에 랭킹을 적용해보자! (0)	2022.01.20
[es] 검색쿼리를 만들어 보자 (0)	2022.01.15
[es] Bool Query (0)	2022.01.10

'ElasticStack/Elasticsearch' Related Articles

Comments

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

개발잡부

개발잡부

[es] 검색결과를 검증해보자 본문

[es] 검색결과를 검증해보자

정밀도 (Precision)

재현율 (Recall)

성능평가 알고리즘

'ElasticStack > Elasticsearch' 카테고리의 다른 글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역