There seems to be something wrong with my code implementation of the evaluation metric ndcg

  Kiến thức lập trình
def ndcg(preds: dict,test_gd: dict, topN=50):
    """NDCG@topN

    Args:
        preds (dict): preds[user] = [item1,item2,...]
        test_gd (dict): test_gd[user] = [item1,item2,...]
        topN (int, optional): topn. Defaults to 50.

    Returns:
        dict: ndcg@topN
    """
    total_recall = 0.0
    total_ndcg = 0.0
    for user in test_gd.keys():
        recall = 0
        dcg = 0.0
        item_list = test_gd[user]
        for no, item_id in enumerate(item_list):
            if item_id in preds[user][:topN]:
                recall += 1
                dcg += 1.0 / math.log(no+2, 2)
            idcg = 0.0
            for no in range(recall):
                idcg += 1.0 / math.log(no+2, 2)
        total_recall += recall * 1.0 / len(item_list)
        if recall > 0:
            total_ndcg += dcg / idcg
    total = len(test_gd)
    ndcg = total_ndcg / total
    return {f'ndcg@{topN}': round(ndcg,4)}

In the above code, test_gd is the test data set, and preds is the prediction result sorted by similarity. The forms of the two are the same as the code comments.

The above is the code I got from the Internet, but I don’t think it is correct. I think there is something wrong in the calculation of DCG, but I’m not sure, so I hope someone can help me confirm it.

I think the dcg calculation part should be changed from dcg += 1.0 / np.log2(no+2) to dcg += 1.0 / np.log2(pred_gd[user].index(item_id) + 2).

Because I think if you want to know the detailed ranking score, you should not calculate it on the test data set, but calculate it by the ranking of the predicted results.

I don’t know if anyone can help me confirm this, thank you very much.

New contributor

吾宇翔 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

Theme wordpress giá rẻ Theme wordpress giá rẻ Thiết kế website

LEAVE A COMMENT