EmbeddingModel

class ampligraph.latent_features.EmbeddingModel(k=100, eta=2, epochs=100, batches_count=100, seed=0, embedding_model_params={}, optimizer='adam', optimizer_params={'lr': 0.0005}, loss='nll', loss_params={}, regularizer=None, regularizer_params={}, verbose=False)

Abstract class for embedding models

AmpliGraph neural knowledge graph embeddings models extend this class and its core methods.

Methods

__init__([k, eta, epochs, batches_count, …]) Initialize an EmbeddingModel
fit(X[, early_stopping, early_stopping_params]) Train an EmbeddingModel (with optional early stopping).
get_embeddings(entities[, embedding_type]) Get the embeddings of entities or relations.
predict(X[, from_idx, get_ranks]) Predict the scores of triples using a trained embedding model.
_fn(e_s, e_p, e_o) The scoring function of the model.
_initialize_parameters() Initialize parameters of the model.
_get_model_loss(dataset_iterator) Get the current loss including loss due to regularization.
get_embedding_model_params(output_dict) save the model parameters in the dictionary.
restore_model_params(in_dict) Load the model parameters from the input dictionary.
_save_trained_params() After model fitting, save all the trained parameters in trained_model_params in some order.
_load_model_from_trained_params() Load the model from trained params.
_initialize_early_stopping() Initializes and creates evaluation graph for early stopping
_perform_early_stopping_test(epoch) perform regular validation checks and stop early if the criteria is acheived :param epoch: current training epoch :type epoch: int
configure_evaluation_protocol([config]) Set the configuration for evaluation
set_filter_for_eval(x_filter) Set the filter to be used during evaluation (filtered_corruption = corruptions - filter).
_initialize_eval_graph() Initialize the evaluation graph.
end_evaluation() End the evaluation and close the Tensorflow session.
__init__(k=100, eta=2, epochs=100, batches_count=100, seed=0, embedding_model_params={}, optimizer='adam', optimizer_params={'lr': 0.0005}, loss='nll', loss_params={}, regularizer=None, regularizer_params={}, verbose=False)

Initialize an EmbeddingModel

Also creates a new Tensorflow session for training.
Parameters:
  • k (int) – Embedding space dimensionality
  • eta (int) – The number of negatives that must be generated at runtime during training for each positive.
  • epochs (int) – The iterations of the training loop.
  • batches_count (int) – The number of batches in which the training set must be split during the training loop.
  • seed (int) – The seed used by the internal random numbers generator.
  • embedding_model_params (dict) – Model-specific hyperparams, passed to the model as a dictionary. Refer to model-specific documentation for details.
  • optimizer (string) – The optimizer used to minimize the loss function. Choose between ‘sgd’, ‘adagrad’, ‘adam’, ‘momentum’.
  • optimizer_params (dict) –

    Arguments specific to the optimizer, passed as a dictionary.

    Supported keys:

    • ’lr’ (float): learning rate (used by all the optimizers). Default: 0.1.
    • ’momentum’ (float): learning momentum (only used when optimizer=momentum). Default: 0.9.

    Example: optimizer_params={'lr': 0.01}

  • loss (string) –

    The type of loss function to use during training.

    • pairwise the model will use pairwise margin-based loss function.
    • nll the model will use negative loss likelihood.
    • absolute_margin the model will use absolute margin likelihood.
    • self_adversarial the model will use adversarial sampling loss function.
    • multiclass_nll the model will use multiclass nll loss. Switch to multiclass loss defined in [aC15] by passing ‘corrupt_sides’ as [‘s’,’o’] to embedding_model_params. To use loss defined in [KBK17] pass ‘corrupt_sides’ as ‘o’ to embedding_model_params
  • loss_params (dict) –

    Dictionary of loss-specific hyperparameters. See loss functions documentation for additional details.

    Example: optimizer_params={'lr': 0.01} if loss='pairwise'.

  • regularizer (string) –

    The regularization strategy to use with the loss function.

    • None: the model will not use any regularizer (default)
    • ’LP’: the model will use L1, L2 or L3 based on the value of regularizer_params['p'] (see below).
  • regularizer_params (dict) –

    Dictionary of regularizer-specific hyperparameters. See the regularizers documentation for additional details.

    Example: regularizer_params={'lambda': 1e-5, 'p': 2} if regularizer='LP'.

  • verbose (bool) – Verbose mode
fit(X, early_stopping=False, early_stopping_params={})

Train an EmbeddingModel (with optional early stopping).

The model is trained on a training set X using the training protocol described in [TWR+16].
Parameters:
  • X (ndarray, shape [n, 3]) – The training triples
  • early_stopping (bool) – Flag to enable early stopping (default:False)
  • early_stopping_params (dictionary) –

    Dictionary of hyperparameters for the early stopping heuristics.

    The following string keys are supported:

    • ’x_valid’: ndarray, shape [n, 3] : Validation set to be used for early stopping.
    • ’criteria’: string : criteria for early stopping ‘hits10’, ‘hits3’, ‘hits1’ or ‘mrr’(default).
    • ’x_filter’: ndarray, shape [n, 3] : Positive triples to use as filter if a ‘filtered’ early
      stopping criteria is desired (i.e. filtered-MRR if ‘criteria’:’mrr’). Note this will affect training time (no filter by default).
    • ’burn_in’: int : Number of epochs to pass before kicking in early stopping (default: 100).
    • check_interval’: int : Early stopping interval after burn-in (default:10).
    • ’stop_interval’: int : Stop if criteria is performing worse over n consecutive checks (default: 3)
    • ’corruption_entities’: List of entities to be used for corruptions. If ‘all’,
      it uses all entities (default: ‘all’)
    • ’corrupt_side’: Specifies which side to corrupt. ‘s’, ‘o’, ‘s+o’ (default)

    Example: early_stopping_params={x_valid=X['valid'], 'criteria': 'mrr'}

get_embeddings(entities, embedding_type='entity')

Get the embeddings of entities or relations.

Note

Use ampligraph.utils.create_tensorboard_visualizations() to visualize the embeddings with TensorBoard.

Parameters:
  • entities (array-like, dtype=int, shape=[n]) – The entities (or relations) of interest. Element of the vector must be the original string literals, and not internal IDs.
  • embedding_type (string) – If ‘entity’, the entities argument will be considered as a list of knowledge graph entities (i.e. nodes). If set to ‘relation’, they will be treated as relation types instead (i.e. predicates).
Returns:

embeddings – An array of k-dimensional embeddings.

Return type:

ndarray, shape [n, k]

predict(X, from_idx=False, get_ranks=False)

Predict the scores of triples using a trained embedding model.

The function returns raw scores generated by the model.

Note

To obtain probability estimates, use a logistic sigmoid:

>>> model.fit(X)
>>> y_pred = model.predict(np.array([['f', 'y', 'e'], ['b', 'y', 'd']]))
>>> print(y_pred)
array([1.2052395, 1.5818497], dtype=float32)
>>> from scipy.special import expit
>>> expit(y_pred)
array([0.7694556 , 0.82946634], dtype=float32)
Parameters:
  • X (ndarray, shape [n, 3]) – The triples to score.
  • from_idx (bool) – If True, will skip conversion to internal IDs. (default: False).
  • get_ranks (bool) – Flag to compute ranks by scoring against corruptions (default: False).
Returns:

  • scores_predict (ndarray, shape [n]) – The predicted scores for input triples X.
  • rank (ndarray, shape [n]) – Ranks of the triples (only returned if get_ranks=True.

_fn(e_s, e_p, e_o)

The scoring function of the model.

Assigns a score to a list of triples, with a model-specific strategy. Triples are passed as lists of subject, predicate, object embeddings. This function must be overridden by every model to return corresponding score.
Parameters:
  • e_s (Tensor, shape [n]) – The embeddings of a list of subjects.
  • e_p (Tensor, shape [n]) – The embeddings of a list of predicates.
  • e_o (Tensor, shape [n]) – The embeddings of a list of objects.
Returns:

score – The operation corresponding to the scoring function.

Return type:

TensorFlow operation

_initialize_parameters()

Initialize parameters of the model.

This function creates and initializes entity and relation embeddings (with size k). Overload this function if the parameters needs to be initialized differently.

_get_model_loss(dataset_iterator)
Get the current loss including loss due to regularization.
This function must be overridden if the model uses combination of different losses(eg: VAE)
Parameters:dataset_iterator (tf.data.Iterator) – Dataset iterator
Returns:loss – The loss value that must be minimized.
Return type:tf.Tensor
get_embedding_model_params(output_dict)

save the model parameters in the dictionary.

Parameters:output_dict (dictionary) – Dictionary of saved params. It’s the duty of the model to save all the variables correctly, so that it can be used for restoring later.
restore_model_params(in_dict)

Load the model parameters from the input dictionary.

Parameters:in_dict (dictionary) – Dictionary of saved params. It’s the duty of the model to load the variables correctly
_save_trained_params()

After model fitting, save all the trained parameters in trained_model_params in some order. The order would be useful for loading the model. This method must be overridden if the model has any other parameters (apart from entity-relation embeddings)

_load_model_from_trained_params()

Load the model from trained params. While restoring make sure that the order of loaded parameters match the saved order. It’s the duty of the embedding model to load the variables correctly. This method must be overridden if the model has any other parameters (apart from entity-relation embeddings)

_initialize_early_stopping()

Initializes and creates evaluation graph for early stopping

_perform_early_stopping_test(epoch)

perform regular validation checks and stop early if the criteria is acheived :param epoch: current training epoch :type epoch: int

Returns:stopped – Flag to indicate if the early stopping criteria is acheived
Return type:bool
configure_evaluation_protocol(config={'corrupt_side': 's+o', 'corruption_entities': 'all', 'default_protocol': False})

Set the configuration for evaluation

Parameters:config (dictionary) –

Dictionary of parameters for evaluation configuration. Can contain following keys:

  • corruption_entities: List of entities to be used for corruptions. If all, it uses all entities (default: all)
  • corrupt_side: Specifies which side to corrupt. s, o, s+o (default)
  • default_protocol: Boolean flag to indicate whether to use default protocol for evaluation. This computes scores for corruptions of subjects and objects and ranks them separately. This could have been done by evaluating s and o separately and then ranking but it slows down the performance. Hence this mode is used where s+o corruptions are generated at once but ranked separately for speed up.(default: False)
set_filter_for_eval(x_filter)

Set the filter to be used during evaluation (filtered_corruption = corruptions - filter).

We would be using a prime number based assignment and product for do the filtering. We associate a unique prime number for subject entities, object entities and to relations. Product of three prime numbers is divisible only by those three prime numbers. So we generate this product for the filter triples and store it in a hash map. When corruptions are generated for a triple during evaluation, we follow a similar approach and look up the product of corruption in the above hash table. If the corrupted triple is present in the hashmap, it means that it was present in the filter list.

Parameters:x_filter (ndarray, shape [n, 3]) – Filter triples. If the generated corruptions are present in this, they will be removed.
_initialize_eval_graph()

Initialize the evaluation graph.

Use prime number based filtering strategy (refer set_filter_for_eval()), if the filter is set

end_evaluation()

End the evaluation and close the Tensorflow session.