evaluate_performance

ampligraph.evaluation.evaluate_performance(X, model, filter_triples=None, verbose=False, strict=True, rank_against_ent=None, corrupt_side='s+o', use_default_protocol=True)

Evaluate the performance of an embedding model.

Run the relational learning evaluation protocol defined in [BUGD+13].

It computes the rank of each positive triple against a number of negatives generated on the fly. Such negatives are compliant with the local closed world assumption (LCWA), as described in [NMTG16]. In practice, that means only one side of the triple is corrupted (i.e. either the subject or the object).

Note

When filtered mode is enabled (i.e. filtered_triples is not None), to speed up the procedure, we adopt a hashing-based strategy to handle the set difference problem. This strategy is as described below:

  • We compute unique entities and relations in our dataset.
  • We assign unique prime numbers for entities (unique for subject and object separately) and for relations and create three separate hash tables.
  • For each triple in filter_triples, we get the prime numbers associated with subject, relation and object by mapping to their respective hash tables. We then compute the prime product for the filter triple. We store this triple product.
  • Since the numbers assigned to subjects, relations and objects are unique, their prime product is also unique. i.e. a triple \((a, b, c)\) would have a different product compared to triple \((c, b, a)\) as \(a, c\) of subject have different primes compared to \(a, c\) of object.
  • While generating corruptions for evaluation, we hash the triple’s entities and relations and get the associated prime number and compute the prime product for the corrupted triple.
  • If this product is present in the products stored for the filter set, then we remove the corresponding corrupted triple (as it is a duplicate i.e. the corruption triple is present in filter_triples)
  • Using this approach we generate filtered corruptions for evaluation.

Execution Time: This method takes ~20 minutes on FB15K using ComplEx (Intel Xeon Gold 6142, 64 GB Ubuntu 16.04 box, Tesla V100 16GB)

Hint

When rank_against_ent=None, the method will use all distinct entities in the knowledge graph X to generate negatives to rank against. If X includes more than 2.5 million unique entities and relations, the method will return a runtime error. To solve the problem, it is recommended to pass the desired entities to use to generate corruptions to rank_against_ent. Besides, trying to rank a positive against an extremely large number of negatives may be overkilling. As a reference, the popular FB15k-237 dataset has ~15k distinct entities. The evaluation protocol ranks each positives against 15k corruptions per side.

Parameters:
  • X (ndarray, shape [n, 3]) – An array of test triples.
  • model (EmbeddingModel) – A knowledge graph embedding model
  • filter_triples (ndarray of shape [n, 3] or None) – The triples used to filter negatives.
  • verbose (bool) – Verbose mode
  • strict (bool) – Strict mode. If True then any unseen entity will cause a RuntimeError. If False then triples containing unseen entities will be filtered out.
  • rank_against_ent (array-like) – List of entities to use for corruptions. If None, will generate corruptions using all distinct entities. Default is None.
  • corrupt_side (string) –

    Specifies which side of the triple to corrupt:

    • ’s’: corrupt only subject.
    • ’o’: corrupt only object
    • ’s+o’: corrupt both subject and object. The same behaviour is obtained with use_default_protocol=True.

    Note

    If corrupt_side='s+o' the function will return 2*n ranks. If corrupt_side='s' or corrupt_side='o', it will return n ranks, where n is the number of statements in X. The first n elements of ranks are obtained against subject corruptions. From n+1 until 2n ranks are obtained against object corruptions.

  • use_default_protocol (bool) –

    Flag to indicate whether to use the standard protocol used in literature defined in [BUGD+13] (default: True). If set to True it is equivalent to corrupt_side='s+o'. This corresponds to the evaluation protcol used in literature, where head and tail corruptions are evaluated separately.

    Note

    When use_default_protocol=True the function will return 2*n ranks. The first n elements of ranks are obtained against subject corruptions. From n+1 until 2n ranks are obtained against object corruptions.

Returns:

ranks – An array of ranks of positive test triples. When use_default_protocol=True or corrupt_side='s+o', the function returns 2*n ranks instead of n. In that case the first n elements of ranks are obtained against subject corruptions. From n+1 until 2n ranks are obtained against object corruptions.

Return type:

ndarray, shape [n] or [2*n]

Examples

>>> import numpy as np
>>> from ampligraph.datasets import load_wn18
>>> from ampligraph.latent_features import ComplEx
>>> from ampligraph.evaluation import evaluate_performance, mrr_score, hits_at_n_score
>>>
>>> X = load_wn18()
>>> model = ComplEx(batches_count=10, seed=0, epochs=10, k=150, eta=1,
>>>                 loss='nll', optimizer='adam')
>>> model.fit(np.concatenate((X['train'], X['valid'])))
>>>
>>> filter = np.concatenate((X['train'], X['valid'], X['test']))
>>> ranks = evaluate_performance(X['test'][:5], model=model,
                                 filter_triples=filter,
                                 corrupt_side='s+o',
                                 use_default_protocol=False)
>>> ranks
[1, 582, 543, 6, 31]
>>> mrr_score(ranks)
0.24049691297347323
>>> hits_at_n_score(ranks, n=10)
0.4