TransE¶

class ampligraph.latent_features.TransE(k=100, eta=2, epochs=100, batches_count=100, seed=0, embedding_model_params={'negative_corruption_entities': 'all', 'norm': 1, 'normalize_ent_emb': False}, optimizer='adam', optimizer_params={'lr': 0.0005}, loss='nll', loss_params={}, regularizer=None, regularizer_params={}, verbose=False)¶

Translating Embeddings (TransE)

The model as described in [BUGD+13].

The scoring function of TransE computes a similarity between the embedding of the subject \(\mathbf{e}_{sub}\) translated by the embedding of the predicate \(\mathbf{e}_{pred}\) and the embedding of the object \(\mathbf{e}_{obj}\), using the \(L_1\) or \(L_2\) norm \(||\cdot||\):

\[f_{TransE}=-||\mathbf{e}_{sub} + \mathbf{e}_{pred} - \mathbf{e}_{obj}||_n\]

Such scoring function is then used on positive and negative triples \(t^+, t^-\) in the loss function.

Examples

>>> import numpy as np
>>> from ampligraph.latent_features import TransE
>>> model = TransE(batches_count=1, seed=555, epochs=20, k=10, loss='pairwise',
>>>                loss_params={'margin':5})
>>> X = np.array([['a', 'y', 'b'],
>>>               ['b', 'y', 'a'],
>>>               ['a', 'y', 'c'],
>>>               ['c', 'y', 'a'],
>>>               ['a', 'y', 'd'],
>>>               ['c', 'y', 'd'],
>>>               ['b', 'y', 'c'],
>>>               ['f', 'y', 'e']])
>>> model.fit(X)
>>> model.predict(np.array([['f', 'y', 'e'], ['b', 'y', 'd']]))
[-2.219729, -3.9848995]
>>> model.get_embeddings(['f','e'], type='entity')
array([[-0.65229136, -0.50060457,  1.2316223 ,  0.23738968,  0.29145557,
-0.20187911, -0.3053819 , -0.6947149 ,  0.9377473 ,  0.12985024],
[-1.1272118 ,  0.10723944,  0.79431695,  0.6795645 , -0.14428931,
-0.34959725, -0.60184777, -1.1885864 ,  1.0374763 , -0.36612505]],
dtype=float32)

Methods

`__init__`([k, eta, epochs, batches_count, …])	Initialize an EmbeddingModel
`fit`(X[, early_stopping, early_stopping_params])	Train an Translating Embeddings model.
`get_embeddings`(entities[, type])	Get the embeddings of entities or relations.
`predict`(X[, from_idx, get_ranks])	Predict the scores of triples using a trained embedding model.

__init__(k=100, eta=2, epochs=100, batches_count=100, seed=0, embedding_model_params={'negative_corruption_entities': 'all', 'norm': 1, 'normalize_ent_emb': False}, optimizer='adam', optimizer_params={'lr': 0.0005}, loss='nll', loss_params={}, regularizer=None, regularizer_params={}, verbose=False)¶

Initialize an EmbeddingModel

Also creates a new Tensorflow session for training.

Parameters:

k (int) – Embedding space dimensionality
eta (int) – The number of negatives that must be generated at runtime during training for each positive.
epochs (int) – The iterations of the training loop.
batches_count (int) – The number of batches in which the training set must be split during the training loop.
seed (int) – The seed used by the internal random numbers generator.
embedding_model_params (dict) –
TransE-specific hyperparams, passed to the model as a dictionary.

Supported keys:
- ’norm’ (int): the norm to be used in the scoring function (1 or 2-norm - default: 1).
- ’normalize_ent_emb’ (bool): flag to indicate whether to normalize entity embeddings after each batch update (default: False).
- negative_corruption_entities : entities to be used for generation of corruptions while training. It can take the following values : all (default: all entities), batch (entities present in each batch), list of entities or an int (which indicates how many entities that should be used for corruption generation).
Example: embedding_model_params={'norm': 1, 'normalize_ent_emb': False}

optimizer : string

The optimizer used to minimize the loss function. Choose between ‘sgd’, ‘adagrad’, ‘adam’, ‘momentum’.

optimizer_params : dict

Arguments specific to the optimizer, passed as a dictionary.

Supported keys:

‘lr’ (float): learning rate (used by all the optimizers). Default: 0.1.
‘momentum’ (float): learning momentum (only used when optimizer=momentum). Default: 0.9.

Example: optimizer_params={'lr': 0.01}

loss : string

The type of loss function to use during training.

‘pairwise’ the model will use pairwise margin-based loss function.
‘nll’ the model will use negative loss likelihood.
‘absolute_margin’ the model will use absolute margin likelihood.
‘self_adversarial’ the model will use adversarial sampling loss function.

loss_params : dict

Dictionary of loss-specific hyperparameters. See loss functions documentation for additional details.

Example: optimizer_params={'lr': 0.01} if loss='pairwise'.

regularizer : string

The regularization strategy to use with the loss function.

None: the model will not use any regularizer (default)
‘LP’: the model will use L1, L2 or L3 based on the value of regularizer_params['p'] (see below).

regularizer_params : dict

Dictionary of regularizer-specific hyperparameters. See the regularizers documentation for additional details.

Example: regularizer_params={'lambda': 1e-5, 'p': 2} if regularizer='LP'.

verbose : bool

Verbose mode

fit(X, early_stopping=False, early_stopping_params={})¶

Train an Translating Embeddings model.

The model is trained on a training set X using the training protocol described in [TWR+16].

Parameters:

X (ndarray, shape [n, 3]) – The training triples
early_stopping (bool) – Flag to enable early stopping (default:False)
early_stopping_params (dictionary) –
Dictionary of hyperparameters for the early stopping heuristics.

The following string keys are supported:
- ’x_valid’: ndarray, shape [n, 3] : Validation set to be used for early stopping.
- ’criteria’: string : criteria for early stopping ‘hits10’, ‘hits3’, ‘hits1’ or ‘mrr’(default).
- ’x_filter’: ndarray, shape [n, 3] : Positive triples to use as filter if a ‘filtered’ early stopping criteria is desired (i.e. filtered-MRR if ‘criteria’:’mrr’). Note this will affect training time (no filter by default).
- ’burn_in’: int : Number of epochs to pass before kicking in early stopping (default: 100).
- check_interval’: int : Early stopping interval after burn-in (default:10).
- ’stop_interval’: int : Stop if criteria is performing worse over n consecutive checks (default: 3)
- ’corruption_entities’: List of entities to be used for corruptions. If ‘all’, it uses all entities (default: ‘all’)
- ’corrupt_side’: Specifies which side to corrupt. ‘s’, ‘o’, ‘s+o’ (default)
Example: early_stopping_params={x_valid=X['valid'], 'criteria': 'mrr'}

get_embeddings(entities, type='entity')¶

Get the embeddings of entities or relations.

Parameters:	entities (array-like, dtype=int, shape=[n]) – The entities (or relations) of interest. Element of the vector must be the original string literals, and not internal IDs. type (string) – If ‘entity’, the `entities` argument will be considered as a list of knowledge graph entities (i.e. nodes). If set to ‘relation’, they will be treated as relation types instead (i.e. predicates).
Returns:	embeddings – An array of k-dimensional embeddings.
Return type:	ndarray, shape [n, k]

predict(X, from_idx=False, get_ranks=False)¶

Predict the scores of triples using a trained embedding model.

The function returns raw scores generated by the model.

Note

To obtain probability estimates, use a logistic sigmoid:

>>> model.fit(X)
>>> y_pred = model.predict(np.array([['f', 'y', 'e'], ['b', 'y', 'd']]))
>>> print(y_pred)
array([1.2052395, 1.5818497], dtype=float32)
>>> from scipy.special import expit
>>> expit(y_pred)
array([0.7694556 , 0.82946634], dtype=float32)

Parameters:

X (ndarray, shape [n, 3]) – The triples to score.
from_idx (bool) – If True, will skip conversion to internal IDs. (default: False).
get_ranks (bool) – Flag to compute ranks by scoring against corruptions (default: False).

Returns:

scores_predict (ndarray, shape [n]) – The predicted scores for input triples X.
rank (ndarray, shape [n]) – Ranks of the triples (only returned if get_ranks=True.