evaluate_performance

ampligraph.evaluation.evaluate_performance(X, model, filter_triples=None, verbose=False, strict=True, rank_against_ent=None, corrupt_side='s+o', use_default_protocol=True)

Evaluate the performance of an embedding model.

Run the relational learning evaluation protocol defined in [BUGD+13].

It computes the ranks of each positive triple against all possible negatives created in compliance with the local closed world assumption (LCWA), as described in [NMTG16].

Note

When filtered mode is enabled (i.e. filtered_triples is not None), to speed up the procedure, we adopt a hashing-based strategy to handle the set difference problem. This strategy is as described below:

  • We compute unique entities and relations in our dataset.
  • We assign unique prime numbers for entities (unique for subject and object separately) and for relations and create three separate hash tables.
  • For each triple in filter_triples, we get the prime numbers associated with subject, relation and object by mapping to their respective hash tables. We then compute the prime product for the filter triple. We store this triple product.
  • Since the numbers assigned to subjects, relations and objects are unique, their prime product is also unique. i.e. a triple \((a, b, c)\) would have a different product compared to triple \((c, b, a)\) as \(a, c\) of subject have different primes compared to \(a, c\) of object.
  • While generating corruptions for evaluation, we hash the triple’s entities and relations and get the associated prime number and compute the prime product for the corrupted triple.
  • If this product is present in the products stored for the filter set, then we remove the corresponding corrupted triple (as it is a duplicate i.e. the corruption triple is present in filter_triples)
  • Using this approach we generate filtered corruptions for evaluation.

Execution Time: This method takes ~20 minutes on FB15K using ComplEx (Intel Xeon Gold 6142, 64 GB Ubuntu 16.04 box, Tesla V100 16GB)

Hint

When rank_against_ent=None, the method will use all distinct entities in the knowledge graph X to generate negatives to rank against. If X includes more than 1 million unique entities and relations, the method will return a runtime error. To solve the problem, it is recommended to pass the desired entities to use to generate corruptions to rank_against_ent. Besides, trying to rank a positive against an extremely large number of negatives may be overkilling. As a reference, the popular FB15k-237 dataset has ~15k distinct entities. The evaluation protocol ranks each positives against 15k corruptions per side.

Parameters:
  • X (ndarray, shape [n, 3]) – An array of test triples.
  • model (EmbeddingModel) – A knowledge graph embedding model
  • filter_triples (ndarray of shape [n, 3] or None) – The triples used to filter negatives.
  • verbose (bool) – Verbose mode
  • strict (bool) – Strict mode. If True then any unseen entity will cause a RuntimeError. If False then triples containing unseen entities will be filtered out.
  • rank_against_ent (array-like) – List of entities to use for corruptions. If None, will generate corruptions using all distinct entities. Default is None.
  • corrupt_side (string) –

    Specifies which side of the triple to corrupt:

    • ’s’: corrupt only subject.
    • ’o’: corrupt only object
    • ’s+o’: corrupt both subject and object
  • use_default_protocol (bool) – Flag to indicate whether to evaluate head and tail corruptions separately (default: True). If this is set to true, it will also ignore the corrupt_side argument and corrupt both head and tail separately and rank triples.
Returns:

ranks – An array of ranks of positive test triples.

Return type:

ndarray, shape [n]

Examples

>>> import numpy as np
>>> from ampligraph.datasets import load_wn18
>>> from ampligraph.latent_features import ComplEx
>>> from ampligraph.evaluation import evaluate_performance
>>>
>>> X = load_wn18()
>>> model = ComplEx(batches_count=10, seed=0, epochs=1, k=150, eta=10,
>>>                 loss='pairwise', optimizer='adagrad')
>>> model.fit(np.concatenate((X['train'], X['valid'])))
>>>
>>> filter = np.concatenate((X['train'], X['valid'], X['test']))
>>> ranks = evaluate_performance(X['test'][:5], model=model, filter_triples=filter)
>>> ranks
array([    2,     4,     1,     1, 28550], dtype=int32)
>>> mrr_score(ranks)
0.55000700525394053
>>> hits_at_n_score(ranks, n=10)
0.8