# TransE¶

class ampligraph.latent_features.TransE(k=100, eta=2, epochs=100, batches_count=100, seed=0, embedding_model_params={'corrupt_sides': ['s+o'], 'negative_corruption_entities': 'all', 'norm': 1, 'normalize_ent_emb': False}, optimizer='adam', optimizer_params={'lr': 0.0005}, loss='nll', loss_params={}, regularizer=None, regularizer_params={}, initializer='xavier', initializer_params={'uniform': False}, verbose=False)

Translating Embeddings (TransE)

The model as described in [BUGD+13].

The scoring function of TransE computes a similarity between the embedding of the subject $$\mathbf{e}_{sub}$$ translated by the embedding of the predicate $$\mathbf{e}_{pred}$$ and the embedding of the object $$\mathbf{e}_{obj}$$, using the $$L_1$$ or $$L_2$$ norm $$||\cdot||$$:

$f_{TransE}=-||\mathbf{e}_{sub} + \mathbf{e}_{pred} - \mathbf{e}_{obj}||_n$

Such scoring function is then used on positive and negative triples $$t^+, t^-$$ in the loss function.

Examples

>>> import numpy as np
>>> from ampligraph.latent_features import TransE
>>> model = TransE(batches_count=1, seed=555, epochs=20, k=10, loss='pairwise',
>>>                loss_params={'margin':5})
>>> X = np.array([['a', 'y', 'b'],
>>>               ['b', 'y', 'a'],
>>>               ['a', 'y', 'c'],
>>>               ['c', 'y', 'a'],
>>>               ['a', 'y', 'd'],
>>>               ['c', 'y', 'd'],
>>>               ['b', 'y', 'c'],
>>>               ['f', 'y', 'e']])
>>> model.fit(X)
>>> model.predict(np.array([['f', 'y', 'e'], ['b', 'y', 'd']]))
[[-3.557618], [-5.8582983]]
>>> model.get_embeddings(['f','e'], embedding_type='entity')
array([[ 0.20124815,  0.07667076,  0.13765174,  0.359908  ,  0.47391438,
0.60537165, -0.1865169 ,  0.19727449,  0.05368415,  0.10683826],
[-0.00791226, -0.02880736,  0.33046484,  0.4772845 ,  0.09900524,
-0.07427583, -0.44486347,  0.25502214,  0.40891314, -0.02437211]],
dtype=float32)


Methods

 __init__([k, eta, epochs, batches_count, …]) Initialize an EmbeddingModel. fit(X[, early_stopping, early_stopping_params]) Train an Translating Embeddings model. get_embeddings(entities[, embedding_type]) Get the embeddings of entities or relations. predict(X[, from_idx]) Predict the scores of triples using a trained embedding model.
__init__(k=100, eta=2, epochs=100, batches_count=100, seed=0, embedding_model_params={'corrupt_sides': ['s+o'], 'negative_corruption_entities': 'all', 'norm': 1, 'normalize_ent_emb': False}, optimizer='adam', optimizer_params={'lr': 0.0005}, loss='nll', loss_params={}, regularizer=None, regularizer_params={}, initializer='xavier', initializer_params={'uniform': False}, verbose=False)

Initialize an EmbeddingModel.

Also creates a new Tensorflow session for training.

Parameters: k (int) – Embedding space dimensionality. eta (int) – The number of negatives that must be generated at runtime during training for each positive. epochs (int) – The iterations of the training loop. batches_count (int) – The number of batches in which the training set must be split during the training loop. seed (int) – The seed used by the internal random numbers generator. embedding_model_params (dict) – TransE-specific hyperparams, passed to the model as a dictionary. Supported keys: ’norm’ (int): the norm to be used in the scoring function (1 or 2-norm - default: 1). ’normalize_ent_emb’ (bool): flag to indicate whether to normalize entity embeddings after each batch update (default: False). negative_corruption_entities : entities to be used for generation of corruptions while training. It can take the following values : all (default: all entities), batch (entities present in each batch), list of entities or an int (which indicates how many entities that should be used for corruption generation). corrupt_sides : Specifies how to generate corruptions for training. Takes values s, o, s+o or any combination passed as a list. Example: embedding_model_params={'norm': 1, 'normalize_ent_emb': False} optimizer (string) – The optimizer used to minimize the loss function. Choose between ‘sgd’, ‘adagrad’, ‘adam’, ‘momentum’. optimizer_params (dict) – Arguments specific to the optimizer, passed as a dictionary. Supported keys: ’lr’ (float): learning rate (used by all the optimizers). Default: 0.1. ’momentum’ (float): learning momentum (only used when optimizer=momentum). Default: 0.9. Example: optimizer_params={'lr': 0.01} loss (string) – The type of loss function to use during training. pairwise the model will use pairwise margin-based loss function. nll the model will use negative loss likelihood. absolute_margin the model will use absolute margin likelihood. self_adversarial the model will use adversarial sampling loss function. multiclass_nll the model will use multiclass nll loss. Switch to multiclass loss defined in [aC15] by passing ‘corrupt_sides’ as [‘s’,’o’] to embedding_model_params. To use loss defined in [KBK17] pass ‘corrupt_sides’ as ‘o’ to embedding_model_params. loss_params (dict) – Dictionary of loss-specific hyperparameters. See loss functions documentation for additional details. Example: optimizer_params={'lr': 0.01} if loss='pairwise'. regularizer (string) – The regularization strategy to use with the loss function. None: the model will not use any regularizer (default) ’LP’: the model will use L1, L2 or L3 based on the value of regularizer_params['p'] (see below). regularizer_params (dict) – Dictionary of regularizer-specific hyperparameters. See the regularizers documentation for additional details. Example: regularizer_params={'lambda': 1e-5, 'p': 2} if regularizer='LP'. initializer (string) – The type of initializer to use. normal: The embeddings will be initialized from a normal distribution uniform: The embeddings will be initialized from a uniform distribution xavier: The embeddings will be initialized using xavier strategy (default) initializer_params (dict) – Dictionary of initializer-specific hyperparameters. See the initializer documentation for additional details. Example: initializer_params={'mean': 0, 'std': 0.001} if initializer='normal'. verbose (bool) – Verbose mode
fit(X, early_stopping=False, early_stopping_params={})

Train an Translating Embeddings model.

The model is trained on a training set X using the training protocol described in [TWR+16].

Parameters: X (ndarray, shape [n, 3]) – The training triples early_stopping (bool) – Flag to enable early stopping (default:False). If set to True, the training loop adopts the following early stopping heuristic: The model will be trained regardless of early stopping for burn_in epochs. Every check_interval epochs the method will compute the metric specified in criteria. If such metric decreases for stop_interval checks, we stop training early. Note the metric is computed on x_valid. This is usually a validation set that you held out. Also, because criteria is a ranking metric, it requires generating negatives. Entities used to generate corruptions can be specified, as long as the side(s) of a triple to corrupt. The method supports filtered metrics, by passing an array of positives to x_filter. This will be used to filter the negatives generated on the fly (i.e. the corruptions). Note Keep in mind the early stopping criteria may introduce a certain overhead (caused by the metric computation). The goal is to strike a good trade-off between such overhead and saving training epochs. A common approach is to use MRR unfiltered: early_stopping_params={x_valid=X['valid'], 'criteria': 'mrr'}  Note the size of validation set also contributes to such overhead. In most cases a smaller validation set would be enough. early_stopping_params (dictionary) – Dictionary of hyperparameters for the early stopping heuristics. The following string keys are supported: ’x_valid’: ndarray, shape [n, 3] : Validation set to be used for early stopping. ’criteria’: string : criteria for early stopping ‘hits10’, ‘hits3’, ‘hits1’ or ‘mrr’(default). ’x_filter’: ndarray, shape [n, 3] : Positive triples to use as filter if a ‘filtered’ early stopping criteria is desired (i.e. filtered-MRR if ‘criteria’:’mrr’). Note this will affect training time (no filter by default). ’burn_in’: int : Number of epochs to pass before kicking in early stopping (default: 100). check_interval’: int : Early stopping interval after burn-in (default:10). ’stop_interval’: int : Stop if criteria is performing worse over n consecutive checks (default: 3) ’corruption_entities’: List of entities to be used for corruptions. If ‘all’, it uses all entities (default: ‘all’) ’corrupt_side’: Specifies which side to corrupt. ‘s’, ‘o’, ‘s+o’ (default) Example: early_stopping_params={x_valid=X['valid'], 'criteria': 'mrr'}
get_embeddings(entities, embedding_type='entity')

Get the embeddings of entities or relations.

Note

Use ampligraph.utils.create_tensorboard_visualizations() to visualize the embeddings with TensorBoard.

Parameters: entities (array-like, dtype=int, shape=[n]) – The entities (or relations) of interest. Element of the vector must be the original string literals, and not internal IDs. embedding_type (string) – If ‘entity’, entities argument will be considered as a list of knowledge graph entities (i.e. nodes). If set to ‘relation’, they will be treated as relation types instead (i.e. predicates). embeddings – An array of k-dimensional embeddings. ndarray, shape [n, k]
predict(X, from_idx=False)

Predict the scores of triples using a trained embedding model.

The function returns raw scores generated by the model.

Note

To obtain probability estimates, use a logistic sigmoid:

>>> model.fit(X)
>>> y_pred = model.predict(np.array([['f', 'y', 'e'], ['b', 'y', 'd']]))
>>> print(y_pred)
[-4.6903257, -3.9047198]
>>> from scipy.special import expit
>>> expit(y_pred)
array([0.00910012, 0.01974873], dtype=float32)

Parameters: X (ndarray, shape [n, 3]) – The triples to score. from_idx (bool) – If True, will skip conversion to internal IDs. (default: False). scores_predict – The predicted scores for input triples X. ndarray, shape [n]