evaluate_performance

ampligraph.evaluation.evaluate_performance(X, model, filter_triples=None, verbose=False, strict=True, rank_against_ent=None, corrupt_side='s+o')

Evaluate the performance of an embedding model.

Run the relational learning evaluation protocol defined in [BUGD+13].

It computes the mean reciprocal rank, by assessing the ranking of each positive triple against all possible negatives created in compliance with the local closed world assumption (LCWA) [NMTG16].

For filtering, we use a hashing based strategy to speed up the computation (i.e. to solve the set difference problem). This strategy is as described below:

  • We compute unique entities and relations in our dataset
  • We assign unique prime numbers for entities (unique for subject and object separately) and for relations and create 3 hash tables.
  • For each triplet in the filter_triples, we get the prime numbers associated with subject, relation and object by mapping to their respective hash tables; and we compute the prime product for the filter triplet. We store this triplet product.
  • Since the numbers assigned to subjects, relations and objects are unique, their prime product is also unique. i.e. a triplet [a, b, c] would have a different product compared to triplet [c, b, a] as a, c of subject have different primes compared to a, c of object.
  • While generating corruptions for evaluation, we hash the triplet entities and relations and get the associated prime number and compute the prime product for the corruption triplet.
  • If this product is present in the products stored for the filter set, then we remove the corresponding corruption triplet (as it is a duplicate i.e. the corruption triplet is present in filter_triples)
  • Using this approach we generate filtered corruptions for evaluation.

Benefits: Initially, we had a python loop based set difference computation. This method used to take around 3 hours with fb15k test set evaluation. With the new hashing strategy, it has now reduced to less than 10 minutes.

Warning: Currently we are using the first million primes taken from primes.utm.edu. If the dataset being used is too sparse, with millions of unique entities and relations, this method wouldn’t work. There is also a problem of overflow if the prime product goes beyond the range of long.

Parameters:
  • X (ndarray, shape [n, 3]) – An array of test triples.
  • model (ampligraph.latent_features.EmbeddingModel) – A knowledge graph embedding model
  • filter_triples (ndarray of shape [n, 3] or None) – The triples used to filter negatives.
  • verbose (bool) – Verbose mode
  • strict (bool) – Strict mode. If True then any unseen entity will cause a RuntimeError. If False then triples containing unseen entities will be filtered out.
  • rank_against_ent (array-like) – List of entities to use for corruptions. If None, will generate corruptions using all distinct entities. Default is None.
  • corrupt_side (string) – Specifies which side to corrupt the entities. s is to corrupt only subject. o is to corrupt only object s+o is to corrupt both subject and object
Returns:

ranks – An array of ranks of positive test triples.

Return type:

ndarray, shape [n]

Examples

>>> import numpy as np
>>> from ampligraph.datasets import load_wn18
>>> from ampligraph.latent_features import ComplEx
>>> from ampligraph.evaluation import evaluate_performance
>>>
>>> X = load_wn18()
>>> model = ComplEx(batches_count=10, seed=0, epochs=1, k=150, eta=10,
>>>                 loss='pairwise', optimizer='adagrad')
>>> model.fit(np.concatenate((X['train'], X['valid'])))
>>>
>>> filter = np.concatenate((X['train'], X['valid'], X['test']))
>>> ranks = evaluate_performance(X['test'][:5], model=model, filter_triples=filter)
>>> ranks
array([    2,     4,     1,     1, 28550], dtype=int32)
>>> mrr_score(ranks)
0.55000700525394053
>>> hits_at_n_score(ranks, n=10)
0.8