find_nearest_neighbours¶

ampligraph.discovery.find_nearest_neighbours(kge_model, entities, n_neighbors=10, entities_subset=None, metric='euclidean')¶

Return the nearest neighbors of entities.

The method works in the embedding space and finds a desired number of neighboring embeddings. It can operate from all the entities in the graph or from a subset of interest.

Parameters:

kge_model (ampligraph.latent_features.EmbeddingModel) – Trained kge model
entities (list or np.array) – List of entities whose neighbors need to be found
n_neighbors (int) – number of neighbors to be computed
entities_subset (list or np.array) – List of entities from which neighbors need to be computed. If this list is not passed, all the entities in the graph would be used
metric (string or callable) – distance metric to be used with NearestNeighbors algorithm For values that can be passed, refer sklearn NearestNeighbors

Returns:

neighbors (np.array of size (len(entities), n_neighbors)) – Each row contains the n_neighbors neighbours of corresponding concepts in entities
distance (np.array of size (len(entities), n_neighbors)) – Each row contains distances of corresponding neighbours

Examples

>>> model = DistMult(batches_count=2, seed=555, epochs=1, k=10,
>>>                  loss='pairwise', loss_params={'margin': 5},
>>>                  optimizer='adagrad', optimizer_params={'lr': 0.1})
>>> X = np.array([['a', 'y', 'b'],
>>>               ['b', 'y', 'a'],
>>>               ['e', 'y', 'c'],
>>>               ['c', 'z', 'a'],
>>>               ['a', 'z', 'd'],
>>>               ['f', 'z', 'g'],
>>>               ['c', 'z', 'g']])
>>> model.fit(X)
>>> neighbors, dist = find_nearest_neighbours(model,
>>>                                           entities=['b'],
>>>                                           n_neighbors=3,
>>>                                           entities_subset=['a', 'c', 'd', 'e', 'f'])
>>> print(neighbors, dist)
[['e' 'd' 'c']] [[0.97474706 0.979108   1.2323136 ]]