query_topn

ampligraph.discovery.query_topn(model, top_n=10, head=None, relation=None, tail=None, ents_to_consider=None, rels_to_consider=None)

Queries the model with two elements of a triple and returns the top_n results of all possible completions ordered by score predicted by the model.

For example, given a <subject, predicate> pair in the arguments, the model will score all possible triples <subject, predicate, ?>, filling in the missing element with known entities, and return the top_n triples ordered by score. If given a <subject, object> pair it will fill in the missing element with known relations.

Note

This function does not filter out true statements - triples returned can include those the model was trained on.

Parameters:
  • model (EmbeddingModel) – The trained model that will be used to score triple completions.

  • top_n (int) – The number of completed triples to returned.

  • head (str) – An entity string to query.

  • relation (str) – A relation string to query.

  • tail (str) – An object string to query.

  • ents_to_consider (array-like) – List of entities to use for triple completions. If None, will generate completions using all distinct entities (Default: None).

  • rels_to_consider (array-like) – List of relations to use for triple completions. If None, will generate completions using all distinct relations (default: None).

Returns:

  • X (ndarray of shape (n, 3)) – A list of triples ordered by score.

  • S (ndarray, shape (n)) – A list of scores.

Example

>>> import requests
>>> from ampligraph.datasets import load_from_csv
>>> from ampligraph.discovery import discover_facts
>>> from ampligraph.discovery import query_topn
>>> from ampligraph.latent_features import ScoringBasedEmbeddingModel
>>> # Game of Thrones relations dataset
>>> url = 'https://ampligraph.s3-eu-west-1.amazonaws.com/datasets/GoT.csv'
>>> open('GoT.csv', 'wb').write(requests.get(url).content)
>>> X = load_from_csv('.', 'GoT.csv', sep=',')
>>>
>>> model = ScoringBasedEmbeddingModel(eta=5,
>>>                                    k=150,
>>>                                    scoring_type='TransE')
>>> model.compile(optimizer='adagrad', loss='pairwise')
>>> model.fit(X,
>>>           batch_size=100,
>>>           epochs=20,
>>>           verbose=False)
>>>
>>> query_topn(model, top_n=5,
>>>            head='Eddard Stark', relation='ALLIED_WITH', tail=None,
>>>            ents_to_consider=None, rels_to_consider=None)
>>>
(array([['Eddard Stark', 'ALLIED_WITH', 'Smithyton'],
        ['Eddard Stark', 'ALLIED_WITH', 'Eden Risley'],
        ['Eddard Stark', 'ALLIED_WITH', 'House Westbrook'],
        ['Eddard Stark', 'ALLIED_WITH', 'House Leygood'],
        ['Eddard Stark', 'ALLIED_WITH', 'House Bridges']], dtype='<U44'),
 array([9.000417 , 5.272001 , 5.1876183, 5.121145 , 5.0564814],
       dtype=float32))