Evaluation

The module includes performance metrics for neural graph embeddings models, along with model selection routines, negatives generation, and an implementation of the learning-to-rank-based evaluation protocol used in literature.

After the training is complete, the model is ready to perform predictions and to be evaluated on unseen data. Given a triple, the model can score it and quantify its plausibility. Importantly, the entities and relations of new triples must have been seen during training, otherwise no embedding for them is available. Future extensions of the code base will introduce inductive methods as well.

The standard evaluation of a test triples is achieved by comparing the score assigned by the model to that triple with those assigned to the same triple where we corrupted either the object or the subject. From this comparison we extract some metrics. By aggregating the metrics obtained for all triples in the test set, we finally obtain a “thorough” (depending on the quality of the test set and of the corruptions) evaluation of the model.

Metrics

The available metrics implemented in AmpliGraph to rank a triple against its corruptions are listed in the table below.

rank_score(y_true, y_pred[, pos_lab])

Computes the rank of a triple.

mr_score(ranks)

Mean Rank (MR).

mrr_score(ranks)

Mean Reciprocal Rank (MRR).

hits_at_n_score(ranks, n)

Hits@N.

Model Selection

AmpliGraph implements a model selection routine for KGE models via either a grid search or a random search. Random search is typically more efficient, but grid search, on the other hand, can provide a more controlled selection framework.

select_best_model_ranking(model_class, ...)

Model selection routine for embedding models via either grid search or random search.

Helper Functions

Utilities and support functions for evaluation procedures.

train_test_split_no_unseen(X[, test_size, ...])

Split into train and test sets.

filter_unseen_entities(X, model[, verbose])

Filter unseen entities in the test set.