query_topn

ampligraph.discovery.query_topn(model, top_n=10, head=None, relation=None, tail=None, ents_to_consider=None, rels_to_consider=None)

Queries the model with two elements of a triple and returns the top_n results of all possible completions ordered by score predicted by the model.

For example, given a <subject, predicate> pair in the arguments, the model will score all possible triples <subject, predicate, ?>, filling in the missing element with known entities, and return the top_n triples ordered by score. If given a <subject, object> pair it will fill in the missing element with known relations.

Note

This function does not filter out true statements - triples returned can include those the model was trained on.

Parameters:
  • model (EmbeddingModel) – The trained model that will be used to score triple completions.
  • top_n (int) – The number of completed triples to returned.
  • head (string) – An entity string to query.
  • relation (string) – A relation string to query.
  • tail – An object string to query.
  • ents_to_consider (array-like) – List of entities to use for triple completions. If None, will generate completions using all distinct entities. (Default: None.)
  • rels_to_consider (array-like) – List of relations to use for triple completions. If None, will generate completions using all distinct relations. (Default: None.)
Returns:

  • X (ndarray, shape [n, 3]) – A list of triples ordered by score.
  • S (ndarray, shape [n]) – A list of scores.

Examples

>>> import requests
>>> from ampligraph.datasets import load_from_csv
>>> from ampligraph.latent_features import ComplEx
>>> from ampligraph.discovery import discover_facts
>>> from ampligraph.discovery import query_topn
>>>
>>> # Game of Thrones relations dataset
>>> url = 'https://ampligraph.s3-eu-west-1.amazonaws.com/datasets/GoT.csv'
>>> open('GoT.csv', 'wb').write(requests.get(url).content)
>>> X = load_from_csv('.', 'GoT.csv', sep=',')
>>>
>>> model = ComplEx(batches_count=10, seed=0, epochs=200, k=150, eta=5,
>>>                 optimizer='adam', optimizer_params={'lr':1e-3}, loss='multiclass_nll',
>>>                 regularizer='LP', regularizer_params={'p':3, 'lambda':1e-5},
>>>                 verbose=True)
>>> model.fit(X)
>>>
>>> query_topn(model, top_n=5,
>>>            head='Catelyn Stark', relation='ALLIED_WITH', tail=None,
>>>            ents_to_consider=None, rels_to_consider=None)
>>>
(array([['Catelyn Stark', 'ALLIED_WITH', 'House Tully of Riverrun'],
        ['Catelyn Stark', 'ALLIED_WITH', 'House Stark of Winterfell'],
        ['Catelyn Stark', 'ALLIED_WITH', 'House Wayn'],
        ['Catelyn Stark', 'ALLIED_WITH', 'House Mollen'],
        ['Catelyn Stark', 'ALLIED_WITH', 'Orton Merryweather']],
       dtype='<U44'), array([[10.261374 ],
        [ 8.84298  ],
        [ 2.78139  ],
        [ 1.9809164],
        [ 1.833096 ]], dtype=float32))