query_topn¶
- ampligraph.discovery.query_topn(model, top_n=10, head=None, relation=None, tail=None, ents_to_consider=None, rels_to_consider=None)¶
Queries the model with two elements of a triple and returns the top_n results of all possible completions ordered by score predicted by the model.
For example, given a <subject, predicate> pair in the arguments, the model will score all possible triples <subject, predicate, ?>, filling in the missing element with known entities, and return the top_n triples ordered by score. If given a <subject, object> pair it will fill in the missing element with known relations.
Note
This function does not filter out true statements - triples returned can include those the model was trained on.
- Parameters:
model (EmbeddingModel) – The trained model that will be used to score triple completions.
top_n (int) – The number of completed triples to returned.
head (str) – An entity string to query.
relation (str) – A relation string to query.
tail (str) – An object string to query.
ents_to_consider (array-like) – List of entities to use for triple completions. If None, will generate completions using all distinct entities (Default: None).
rels_to_consider (array-like) – List of relations to use for triple completions. If None, will generate completions using all distinct relations (default: None).
- Returns:
X (ndarray of shape (n, 3)) – A list of triples ordered by score.
S (ndarray, shape (n)) – A list of scores.
Example
>>> import requests >>> from ampligraph.datasets import load_from_csv >>> from ampligraph.discovery import discover_facts >>> from ampligraph.discovery import query_topn >>> from ampligraph.latent_features import ScoringBasedEmbeddingModel >>> # Game of Thrones relations dataset >>> url = 'https://ampligraph.s3-eu-west-1.amazonaws.com/datasets/GoT.csv' >>> open('GoT.csv', 'wb').write(requests.get(url).content) >>> X = load_from_csv('.', 'GoT.csv', sep=',') >>> >>> model = ScoringBasedEmbeddingModel(eta=5, >>> k=150, >>> scoring_type='TransE') >>> model.compile(optimizer='adagrad', loss='pairwise') >>> model.fit(X, >>> batch_size=100, >>> epochs=20, >>> verbose=False) >>> >>> query_topn(model, top_n=5, >>> head='Eddard Stark', relation='ALLIED_WITH', tail=None, >>> ents_to_consider=None, rels_to_consider=None) >>> (array([['Eddard Stark', 'ALLIED_WITH', 'Smithyton'], ['Eddard Stark', 'ALLIED_WITH', 'Eden Risley'], ['Eddard Stark', 'ALLIED_WITH', 'House Westbrook'], ['Eddard Stark', 'ALLIED_WITH', 'House Leygood'], ['Eddard Stark', 'ALLIED_WITH', 'House Bridges']], dtype='<U44'), array([9.000417 , 5.272001 , 5.1876183, 5.121145 , 5.0564814], dtype=float32))