EmbeddingModel¶

class
ampligraph.latent_features.
EmbeddingModel
(k=100, eta=2, epochs=100, batches_count=100, seed=0, embedding_model_params={}, optimizer='adam', optimizer_params={'lr': 0.0005}, loss='nll', loss_params={}, regularizer=None, regularizer_params={}, initializer='xavier', initializer_params={'uniform': False}, large_graphs=False, verbose=False)¶ Abstract class for embedding models
AmpliGraph neural knowledge graph embeddings models extend this class and its core methods.
Methods
__init__
([k, eta, epochs, batches_count, …])Initialize an EmbeddingModel
fit
(X[, early_stopping, early_stopping_params])Train an EmbeddingModel (with optional early stopping).
get_embeddings
(entities[, embedding_type])Get the embeddings of entities or relations.
Returns hyperparameters of the model.
predict
(X[, from_idx])Predict the scores of triples using a trained embedding model.
calibrate
(X_pos[, X_neg, …])Calibrate predictions
Predicts probabilities using the Platt scaling model (after calibration).
_fn
(e_s, e_p, e_o)The scoring function of the model.
Initialize parameters of the model.
_get_model_loss
(dataset_iterator)Get the current loss including loss due to regularization.
get_embedding_model_params
(output_dict)Save the model parameters in the dictionary.
restore_model_params
(in_dict)Load the model parameters from the input dictionary.
After model fitting, save all the trained parameters in trained_model_params in some order.
Load the model from trained params.
Initializes and creates evaluation graph for early stopping.
_perform_early_stopping_test
(epoch)Performs regular validation checks and stop early if the criteria is achieved.
configure_evaluation_protocol
([config])Set the configuration for evaluation
Configures to use filter
_initialize_eval_graph
([mode])Initialize the evaluation graph.
End the evaluation and close the Tensorflow session.

__init__
(k=100, eta=2, epochs=100, batches_count=100, seed=0, embedding_model_params={}, optimizer='adam', optimizer_params={'lr': 0.0005}, loss='nll', loss_params={}, regularizer=None, regularizer_params={}, initializer='xavier', initializer_params={'uniform': False}, large_graphs=False, verbose=False)¶ Initialize an EmbeddingModel
Also creates a new Tensorflow session for training.
 Parameters
k (int) – Embedding space dimensionality.
eta (int) – The number of negatives that must be generated at runtime during training for each positive.
epochs (int) – The iterations of the training loop.
batches_count (int) – The number of batches in which the training set must be split during the training loop.
seed (int) – The seed used by the internal random numbers generator.
embedding_model_params (dict) – Modelspecific hyperparams, passed to the model as a dictionary. Refer to modelspecific documentation for details.
optimizer (string) – The optimizer used to minimize the loss function. Choose between ‘sgd’, ‘adagrad’, ‘adam’, ‘momentum’.
optimizer_params (dict) –
Arguments specific to the optimizer, passed as a dictionary.
Supported keys:
’lr’ (float): learning rate (used by all the optimizers). Default: 0.1.
’momentum’ (float): learning momentum (only used when
optimizer=momentum
). Default: 0.9.
Example:
optimizer_params={'lr': 0.01}
loss (string) –
The type of loss function to use during training.
pairwise
the model will use pairwise marginbased loss function.nll
the model will use negative loss likelihood.absolute_margin
the model will use absolute margin likelihood.self_adversarial
the model will use adversarial sampling loss function.multiclass_nll
the model will use multiclass nll loss. Switch to multiclass loss defined in [aC15] by passing ‘corrupt_side’ as [‘s’,’o’] to embedding_model_params. To use loss defined in [KBK17] pass ‘corrupt_side’ as ‘o’ to embedding_model_params.
loss_params (dict) –
Dictionary of lossspecific hyperparameters. See loss functions documentation for additional details.
Example:
optimizer_params={'lr': 0.01}
ifloss='pairwise'
.regularizer (string) –
The regularization strategy to use with the loss function.
None
: the model will not use any regularizer (default)LP
: the model will use L1, L2 or L3 based on the value ofregularizer_params['p']
(see below).
regularizer_params (dict) –
Dictionary of regularizerspecific hyperparameters. See the regularizers documentation for additional details.
Example:
regularizer_params={'lambda': 1e5, 'p': 2}
ifregularizer='LP'
.initializer (string) –
The type of initializer to use.
normal
: The embeddings will be initialized from a normal distributionuniform
: The embeddings will be initialized from a uniform distributionxavier
: The embeddings will be initialized using xavier strategy (default)
initializer_params (dict) –
Dictionary of initializerspecific hyperparameters. See the initializer documentation for additional details.
Example:
initializer_params={'mean': 0, 'std': 0.001}
ifinitializer='normal'
.large_graphs (bool) – Avoid loading entire dataset onto GPU when dealing with large graphs.
verbose (bool) – Verbose mode.

fit
(X, early_stopping=False, early_stopping_params={})¶ Train an EmbeddingModel (with optional early stopping).
The model is trained on a training set X using the training protocol described in [TWR+16].
 Parameters
X (ndarray (shape [n, 3]) or object of AmpligraphDatasetAdapter) – Numpy array of training triples OR handle of Dataset adapter which would help retrieve data.
early_stopping (bool) – Flag to enable early stopping (default:
False
)early_stopping_params (dictionary) –
Dictionary of hyperparameters for the early stopping heuristics.
The following string keys are supported:
 ’x_valid’: ndarray (shape [n, 3]) or object of AmpligraphDatasetAdapter :
Numpy array of validation triples OR handle of Dataset adapter which would help retrieve data.
’criteria’: string : criteria for early stopping ‘hits10’, ‘hits3’, ‘hits1’ or ‘mrr’(default).
 ’x_filter’: ndarray, shape [n, 3]Positive triples to use as filter if a ‘filtered’ early
stopping criteria is desired (i.e. filteredMRR if ‘criteria’:’mrr’). Note this will affect training time (no filter by default). If the filter has already been set in the adapter, pass True
’burn_in’: int : Number of epochs to pass before kicking in early stopping (default: 100).
check_interval’: int : Early stopping interval after burnin (default:10).
’stop_interval’: int : Stop if criteria is performing worse over n consecutive checks (default: 3)
’corruption_entities’: List of entities to be used for corruptions. If ‘all’, it uses all entities (default: ‘all’)
’corrupt_side’: Specifies which side to corrupt. ‘s’, ‘o’, ‘s+o’, ‘s,o’ (default)
Example:
early_stopping_params={x_valid=X['valid'], 'criteria': 'mrr'}

get_embeddings
(entities, embedding_type='entity')¶ Get the embeddings of entities or relations.
Note
Use
ampligraph.utils.create_tensorboard_visualizations()
to visualize the embeddings with TensorBoard. Parameters
entities (arraylike, dtype=int, shape=[n]) – The entities (or relations) of interest. Element of the vector must be the original string literals, and not internal IDs.
embedding_type (string) – If ‘entity’,
entities
argument will be considered as a list of knowledge graph entities (i.e. nodes). If set to ‘relation’, they will be treated as relation types instead (i.e. predicates).
 Returns
embeddings – An array of kdimensional embeddings.
 Return type
ndarray, shape [n, k]

get_hyperparameter_dict
()¶ Returns hyperparameters of the model.
 Returns
hyperparam_dict – Dictionary of hyperparameters that were used for training.
 Return type
dict

predict
(X, from_idx=False)¶ Predict the scores of triples using a trained embedding model. The function returns raw scores generated by the model.
Note
To obtain probability estimates, calibrate the model with
calibrate()
, then callpredict_proba()
. Parameters
X (ndarray, shape [n, 3]) – The triples to score.
from_idx (bool) – If True, will skip conversion to internal IDs. (default: False).
 Returns
scores_predict – The predicted scores for input triples X.
 Return type
ndarray, shape [n]

calibrate
(X_pos, X_neg=None, positive_base_rate=None, batches_count=100, epochs=50)¶ Calibrate predictions
The method implements the heuristics described in [TC20], using Platt scaling [P+99].
The calibrated predictions can be obtained with
predict_proba()
after calibration is done.Ideally, calibration should be performed on a validation set that was not used to train the embeddings.
There are two modes of operation, depending on the availability of negative triples:
Both positive and negative triples are provided via
X_pos
andX_neg
respectively. The optimization is done using a secondorder method (limitedmemory BFGS), therefore no hyperparameter needs to be specified.Only positive triples are provided, and the negative triples are generated by corruptions just like it is done in training or evaluation. The optimization is done using a firstorder method (ADAM), therefore
batches_count
andepochs
must be specified.
Calibration is highly dependent on the base rate of positive triples. Therefore, for mode (2) of operation, the user is required to provide the
positive_base_rate
argument. For mode (1), that can be inferred automatically by the relative sizes of the positive and negative sets, but the user can override that by providing a value topositive_base_rate
.Defining the positive base rate is the biggest challenge when calibrating without negatives. That depends on the user choice of which triples will be evaluated during test time. Let’s take WN11 as an example: it has around 50% positives triples on both the validation set and test set, so naturally the positive base rate is 50%. However, should the user resample it to have 75% positives and 25% negatives, its previous calibration will be degraded. The user must recalibrate the model now with a 75% positive base rate. Therefore, this parameter depends on how the user handles the dataset and cannot be determined automatically or a priori.
Note
Incompatible with large graph mode (i.e. if
self.dealing_with_large_graphs=True
). Parameters
X_pos (ndarray (shape [n, 3])) – Numpy array of positive triples.
X_neg (ndarray (shape [n, 3])) –
Numpy array of negative triples.
If None, the negative triples are generated via corruptions and the user must provide a positive base rate instead.
positive_base_rate (float) –
Base rate of positive statements.
For example, if we assume there is a fiftyfifty chance of any query to be true, the base rate would be 50%.
If
X_neg
is provided and this is None, the relative sizes ofX_pos
andX_neg
will be used to determine the base rate. For example, if we have 50 positive triples and 200 negative triples, the positive base rate will be assumed to be 50/(50+200) = 1/5 = 0.2.This must be a value between 0 and 1.
batches_count (int) – Number of batches to complete one epoch of the Platt scaling training. Only applies when
X_neg
is None.epochs (int) – Number of epochs used to train the Platt scaling model. Only applies when
X_neg
is None.
Examples
>>> import numpy as np >>> from sklearn.metrics import brier_score_loss, log_loss >>> from scipy.special import expit >>> >>> from ampligraph.datasets import load_wn11 >>> from ampligraph.latent_features.models import TransE >>> >>> X = load_wn11() >>> X_valid_pos = X['valid'][X['valid_labels']] >>> X_valid_neg = X['valid'][~X['valid_labels']] >>> >>> model = TransE(batches_count=64, seed=0, epochs=500, k=100, eta=20, >>> optimizer='adam', optimizer_params={'lr':0.0001}, >>> loss='pairwise', verbose=True) >>> >>> model.fit(X['train']) >>> >>> # Raw scores >>> scores = model.predict(X['test']) >>> >>> # Calibrate with positives and negatives >>> model.calibrate(X_valid_pos, X_valid_neg, positive_base_rate=None) >>> probas_pos_neg = model.predict_proba(X['test']) >>> >>> # Calibrate with just positives and base rate of 50% >>> model.calibrate(X_valid_pos, positive_base_rate=0.5) >>> probas_pos = model.predict_proba(X['test']) >>> >>> # Calibration evaluation with the Brier score loss (the smaller, the better) >>> print("Brier scores") >>> print("Raw scores:", brier_score_loss(X['test_labels'], expit(scores))) >>> print("Positive and negative calibration:", brier_score_loss(X['test_labels'], probas_pos_neg)) >>> print("Positive only calibration:", brier_score_loss(X['test_labels'], probas_pos)) Brier scores Raw scores: 0.4925058891371126 Positive and negative calibration: 0.20434617882733366 Positive only calibration: 0.22597599585144656

predict_proba
(X)¶ Predicts probabilities using the Platt scaling model (after calibration).
Model must be calibrated beforehand with the
calibrate
method. Parameters
X (ndarray (shape [n, 3])) – Numpy array of triples to be evaluated.
 Returns
probas – Probability of each triple to be true according to the Platt scaling calibration.
 Return type
ndarray (shape [n])

abstract
_fn
(e_s, e_p, e_o)¶ The scoring function of the model.
Assigns a score to a list of triples, with a modelspecific strategy. Triples are passed as lists of subject, predicate, object embeddings. This function must be overridden by every model to return corresponding score.
 Parameters
e_s (Tensor, shape [n]) – The embeddings of a list of subjects.
e_p (Tensor, shape [n]) – The embeddings of a list of predicates.
e_o (Tensor, shape [n]) – The embeddings of a list of objects.
 Returns
score – The operation corresponding to the scoring function.
 Return type
TensorFlow operation

_initialize_parameters
()¶ Initialize parameters of the model.
This function creates and initializes entity and relation embeddings (with size k). If the graph is large, then it loads only the required entity embeddings (max:batch_size*2) and all relation embeddings. Overload this function if the parameters needs to be initialized differently.

_get_model_loss
(dataset_iterator)¶ Get the current loss including loss due to regularization. This function must be overridden if the model uses combination of different losses(eg: VAE).
 Parameters
dataset_iterator (tf.data.Iterator) – Dataset iterator.
 Returns
loss – The loss value that must be minimized.
 Return type
tf.Tensor

get_embedding_model_params
(output_dict)¶ Save the model parameters in the dictionary.
 Parameters
output_dict (dictionary) – Dictionary of saved params. It’s the duty of the model to save all the variables correctly, so that it can be used for restoring later.

restore_model_params
(in_dict)¶ Load the model parameters from the input dictionary.
 Parameters
in_dict (dictionary) – Dictionary of saved params. It’s the duty of the model to load the variables correctly.

_save_trained_params
()¶ After model fitting, save all the trained parameters in trained_model_params in some order. The order would be useful for loading the model. This method must be overridden if the model has any other parameters (apart from entityrelation embeddings).

_load_model_from_trained_params
()¶ Load the model from trained params. While restoring make sure that the order of loaded parameters match the saved order. It’s the duty of the embedding model to load the variables correctly. This method must be overridden if the model has any other parameters (apart from entityrelation embeddings). This function also set’s the evaluation mode to do lazy loading of variables based on the number of distinct entities present in the graph.

_initialize_early_stopping
()¶ Initializes and creates evaluation graph for early stopping.

_perform_early_stopping_test
(epoch)¶ Performs regular validation checks and stop early if the criteria is achieved.
 Parameters
epoch (int) – current training epoch.
 Returns
stopped – Flag to indicate if the early stopping criteria is achieved.
 Return type
bool

configure_evaluation_protocol
(config=None)¶ Set the configuration for evaluation
 Parameters
config (dictionary) –
Dictionary of parameters for evaluation configuration. Can contain following keys:
corruption_entities: List of entities to be used for corruptions. If
all
, it uses all entities (default:all
)corrupt_side: Specifies which side to corrupt.
s
,o
,s+o
,s,o
(default) In ‘s,o’ mode subject and object corruptions are generated at once but ranked separately for speed up (default: False).

set_filter_for_eval
()¶ Configures to use filter

_initialize_eval_graph
(mode='test')¶ Initialize the evaluation graph.
 Parameters
mode (string) – Indicates which data generator to use.

end_evaluation
()¶ End the evaluation and close the Tensorflow session.
