Skip to content

Evaluating Models for Entity Linking with Datasets

Paco Nathan edited this page Oct 3, 2019 · 3 revisions

This leaderboard competition evaluates models based on a Top5uptoD relevance ranking, assuming that each publication contains D <= 5 datasets. This approach prioritizes precision, taking into account the variable number of datasets per publication.

In the case of D = 3 datasets, if the Top4 contains all 3 datasets then nothing beyond the 3rd ranked item will be considered relevant. In other words, this approach does not penalize relevance past discovering all D corpora in the rank-ordered results. If all D datasets do not appear in the Top5, the ranking reverts back to a Top5 error.

To illustrate the relevance ranking, see: Relevance Up To D

If possible with the modeling approach, each predicted dataset should have an estimate for the uncertainty of that prediction.

To calculate the aggregate precision for correct datasets in the Top5uptoD entries across all publications in the corpus, use a 5-fold average. In the following sample code, assume that my_train() trains a model, my_predict() uses that model to predict dataset labels from a publication, and top5UptoD_err() calculates the Top5uptoD relevance ranking:

from sklearn.model_selection import KFold

kf = KFold(n_splits=5, shuffle=True, random_state=2019)
cv_errs = []
iter = 0

for train_index, test_index in kf.split(pub_contexts):
    print(f'fold {iter}')
    X_train = [pub_contexts[i] for i in train_index]
    X_test = [pub_contexts[i] for i in test_index]
    y_train = [pub_labels[i] for i in train_index]
    y_test = [pub_labels[i] for i in test_index]
    
    model = my_train(X_train, y_train)
    errs = []

    for i, context in enumerate(X_test):
        preds = my_predict(context, model, 5)
        errs.append(top5UptoD_err(y_test[i], [p[0] for p in preds]))

    cv_errs.append(mean(errs))
    print(f'top5UptoD error rate: {mean(errs)}')
    iter += 1

print(f'aggregate precision: {1.0 - mean(cv_errs)}')

kudos: @philipskokoh

Clone this wiki locally