ScoringBasedEmbeddingModel

class ampligraph.latent_features.ScoringBasedEmbeddingModel(*args, **kwargs)

Class for handling KGE models which follows the ranking based protocol.

Example

>>> # create model and compile using user defined optimizer settings and
>>> # user defined settings of an existing loss
>>> from ampligraph.datasets import load_fb15k_237
>>> from ampligraph.latent_features import ScoringBasedEmbeddingModel
>>> from ampligraph.latent_features.loss_functions import SelfAdversarialLoss
>>> import tensorflow as tf
>>> X = load_fb15k_237()
>>> loss = SelfAdversarialLoss({'margin': 0.1, 'alpha': 5, 'reduction': 'sum'})
>>> model = ScoringBasedEmbeddingModel(eta=5,
>>>                                    k=300,
>>>                                    scoring_type='ComplEx',
>>>                                    seed=0)
>>> model.compile(optimizer='adam', loss=loss)
>>> model.fit(X['train'],
>>>           batch_size=10000,
>>>           epochs=5)
Epoch 1/5
29/29 [==============================] - 3s 87ms/step - loss: 13496.5752
Epoch 2/5
29/29 [==============================] - 1s 36ms/step - loss: 13488.8682
Epoch 3/5
29/29 [==============================] - 1s 35ms/step - loss: 13436.2725
Epoch 4/5
29/29 [==============================] - 1s 35ms/step - loss: 13259.0840
Epoch 5/5
29/29 [==============================] - 1s 34ms/step - loss: 12977.0117

Attributes

metrics

Returns all the metrics that will be computed during training.

Methods

__init__(eta, k[, scoring_type, seed, ...])

Initializes the scoring based embedding model using the user specified scoring function.

build(input_shape)

Override the build function of the Model class.

build_full_model([batch_size])

This method is called while loading the weights to build the model.

calibrate(X_pos[, X_neg, ...])

Calibrate predictions.

call(inputs[, training])

Computes the scores of the triples and returns the corruption scores as well.

compile([optimizer, loss, ...])

Compile the model.

compute_focusE_weights(weights, structure_weight)

Compute positive and negative weights to scale scores if use_focusE=True.

compute_output_shape(inputShape)

Returns the output shape of the outputs of the call function.

evaluate([x, batch_size, verbose, ...])

Evaluate the inputs against corruptions and return ranks.

fit([x, batch_size, epochs, verbose, ...])

Fit the model on the provided data.

from_config(config)

Creates a layer from its config.

get_config()

Get the configuration hyper-parameters of the scoring based embedding model.

get_count([concept_type])

Returns the count of entities and relations that were present during training.

get_emb_matrix_test([part_number, ...])

Get the embedding matrix during evaluation.

get_embeddings(entities[, embedding_type])

Get the embeddings of entities or relations.

get_focusE_params([dict_params])

Get parameters for focusE.

get_indexes(X[, type_of, order])

Converts given data to indexes or to raw data (according to order).

get_train_embedding_matrix_size()

Returns the size of the embedding matrix used for training.

is_fit()

Check whether the model has been fitted already.

load_metadata([filepath, filedir])

load_weights(filepath)

Loads the model weights.

make_calibrate_function()

Similar to keras lib, this function returns the handle to the calibrate step function.

make_predict_function()

Similar to keras lib, this function returns the handle to the predict step function.

make_test_function()

Similar to keras lib, this function returns the handle to test step function.

make_train_function()

Similar to keras lib, this function returns the handle to the training step function.

partition_change_updates(num_ents, ent_emb, ...)

Perform the changes that are required when the partition is modified during training.

predict(x[, batch_size, verbose, callbacks])

Compute scores of the input triples.

predict_proba(x[, batch_size, verbose, ...])

Compute calibrated scores (\(0 ≤ score ≤ 1\)) for the input triples.

predict_step(inputs)

Returns the output of predict step on a batch of data.

predict_step_partitioning(inputs)

Returns the output of predict step on a batch of data.

process_model_inputs_for_test(triples)

Return the processed triples.

save(filepath[, overwrite, ...])

Save the model.

save_metadata([filepath, filedir])

Save metadata.

save_weights(filepath[, overwrite])

Save the trainable weights.

train_step(data)

Training step.

update_focusE_params()

Update the structural weight after decay.

__init__(eta, k, scoring_type='DistMult', seed=0, max_ent_size=None, max_rel_size=None)

Initializes the scoring based embedding model using the user specified scoring function.

Parameters:
  • eta (int) – Num of negatives to use during training per triple.

  • k (int) – Embedding size.

  • scoring_type (str) –

    Name of the scoring layer to use.

    • TransE Translating embedding scoring function will be used

    • DistMult DistMult embedding scoring function will be used

    • ComplEx ComplEx embedding scoring function will be used

    • HolE Holograph embedding scoring function will be used

  • seed (int) – Random seed.

  • max_ent_size (int) – Maximum number of entities that can occur in any partition (default: None).

  • max_rel_size (int) – Maximum number of relations that can occur in any partition (default: None).

build(input_shape)

Override the build function of the Model class.

It is called on the first call to __call__. With this function we set some internal parameters of the encoding layers (needed to build that layers themselves) based on the input data supplied by the user while calling the ~ScoringBasedEmbeddingModel.fit method.

build_full_model(batch_size=100)

This method is called while loading the weights to build the model.

calibrate(X_pos, X_neg=None, positive_base_rate=None, batch_size=32, epochs=50, verbose=0)

Calibrate predictions.

The method implements the heuristics described in [TC20], using Platt scaling [P+99].

The calibrated predictions can be obtained with predict_proba() after calibration is done.

Ideally, calibration should be performed on a validation set that was not used to train the embeddings.

There are two modes of operation, depending on the availability of negative triples:

  1. Both positive and negative triples are provided via X_pos and X_neg respectively. The optimization is done using a second-order method (limited-memory BFGS), therefore no hyperparameter needs to be specified.

  2. Only positive triples are provided, and the negative triples are generated by corruptions, just like it is done in training or evaluation. The optimization is done using a first-order method (ADAM), therefore batches_count and epochs must be specified.

Calibration is highly dependent on the base rate of positive triples. Therefore, for mode (2) of operation, the user is required to provide the positive_base_rate argument. For mode (1), that can be inferred automatically by the relative sizes of the positive and negative sets, but the user can override this behaviour by providing a value to positive_base_rate.

Defining the positive base rate is the biggest challenge when calibrating without negatives. That depends on the user choice of triples to be evaluated during test time. Let’s take the WN11 dataset as an example: it has around 50% positives triples on both the validation set and test set, so the positive base rate follows to be 50%. However, should the user resample it to have 75% positives and 25% negatives, the previous calibration would be degraded. The user must recalibrate the model with a 75% positive base rate. Therefore, this parameter depends on how the user handles the dataset and cannot be determined automatically or a priori.

Parameters:
  • X_pos (np.array, shape (n,3) or str or GraphDataLoader or AbstractGraphPartitioner) – Data OR Filename of the data file OR Data Handle to be used as positive triples.

  • X_neg (np.array, shape (n,3) or str or GraphDataLoader or AbstractGraphPartitioner) –

    Data OR Filename of the data file OR Data Handle to be used as negative triples.

    If None, the negative triples are generated via corruptions and the user must provide a positive base rate instead.

  • positive_base_rate (float) –

    Base rate of positive statements.

    For example, if we assume there is an even chance for any query to be true, the base rate would be 50%.

    If X_neg is provided and positive_base_rate=None, the relative sizes of X_pos and X_neg will be used to determine the base rate. Say we have 50 positive triples and 200 negative triples, the positive base rate will be assumed to be \(\frac{50}{(50+200)} = \frac{1}{5} = 0.2\).

    This value must be \(\in [0,1]\).

  • batches_size (int) – Batch size for positives.

  • epochs (int) – Number of epochs used to train the Platt scaling model. Only applies when X_neg=None.

  • verbose (bool) – Verbosity (default: False).

Example

>>> from ampligraph.datasets import load_fb15k_237
>>> from ampligraph.latent_features import ScoringBasedEmbeddingModel
>>> import numpy as np
>>> dataset = load_fb15k_237()
>>> model = ScoringBasedEmbeddingModel(eta=5,
>>>                                    k=300,
>>>                                    scoring_type='ComplEx')
>>> model.compile(optimizer='adam', loss='nll')
>>> model.fit(dataset['train'],
>>>           batch_size=10000,
>>>           epochs=5)
>>> print('Raw scores (sorted):', np.sort(model.predict(dataset['test'])))
>>> print('Indices obtained by sorting (scores):', np.argsort(model.predict(dataset['test'])))
Raw scores (sorted): [-1.0689778   -0.42082012  -0.39887887 ...  3.261838  3.2755773  3.2768354 ]
Indices obtained by sorting (scores): [ 3834 18634  4066 ...  6237 13633 10961]
>>> model.calibrate(dataset['test'],
>>>                 batch_size=10000,
>>>                 positive_base_rate=0.9,
>>>                 epochs=100)
>>> print('Calibrated scores (sorted):', np.sort(model.predict_proba(dataset['test'])))
>>> print('Indices obtained by sorting (Calibrated):', np.argsort(model.predict_proba(dataset['test'])))
Calibrated scores (sorted): [0.49547982 0.5396996  0.54118955 ... 0.7624245  0.7631044  0.76316655]
Indices obtained by sorting (Calibrated): [ 3834 18634  4066 ...  6237 13633 10961]
call(inputs, training=False)

Computes the scores of the triples and returns the corruption scores as well.

Parameters:

inputs (ndarray, shape (n, 3)) – Batch of input triples.

Returns:

out – List of input scores along with their corruptions.

Return type:

list

compile(optimizer='adam', loss=None, entity_relation_initializer='glorot_uniform', entity_relation_regularizer=None, **kwargs)

Compile the model.

Parameters:
  • optimizer (str (name of optimizer) or optimizer instance) –

    The optimizer used to minimize the loss function. For pre-defined options, choose between “sgd”, “adagrad”, “adam”, “rmsprop”, etc. See tf.keras.optimizers for up-to-date details.

    If a string is passed, then the default parameters of the optimizer will be used.

    If you want to use custom hyperparameters you need to create an instance of the optimizer and pass the instance to the compile function

    import tensorflow as tf
    adam_opt = tf.keras.optimizers.Adam(learning_rate=0.003)
    model.compile(loss='pairwise', optim=adam_opt)
    

  • loss (str (name of objective function), objective function or ampligraph.latent_features.loss_functions.Loss) –

    If a string is passed, you can use one of the following losses which will be used with their default setting:

    • ”pairwise”: the model will use the pairwise margin-based loss function.

    • ”nll”: the model will use the negative loss likelihood.

    • ”absolute_margin”: the model will use the absolute margin likelihood.

    • ”self_adversarial”: the model will use the adversarial sampling loss function.

    • ”multiclass_nll”: the model will use the multiclass nll loss.

      model.compile(loss='absolute_margin', optim='adam')
      

    If you want to modify the default parameters of the loss function, you need to explictly create an instance of the loss with required hyperparameters and then pass this instance.

    from ampligraph.latent_features import AbsoluteMarginLoss
    ab_loss = AbsoluteMarginLoss(loss_params={'margin': 3})
    model.compile(loss=ab_loss, optim='adam')
    

    An objective function is any callable with the signature loss = fn(score_true, score_corr, eta)

    # Create a user defined loss function with the above signature
    def userLoss(scores_pos, scores_neg):
        # user defined loss - takes in 2 params and returns loss
        neg_exp = tf.exp(scores_neg)
        pos_exp = tf.exp(scores_pos)
        # Apply softmax to the scores
        score = pos_exp / (tf.reduce_sum(neg_exp, axis=0) + pos_exp)
        loss = -tf.math.log(score)
        return loss
    # Pass this loss while compiling the model
    model.compile(loss=userLoss, optim='adam')
    

  • entity_relation_initializer (str (name of initializer function), initializer function or tf.keras.initializers.Initializer or list.) –

    Initializer of the entity and relation embeddings. This is either a single value or a list of size 2. If a single value is passed, then both the entities and relations will be initialized based on the same initializer; if a list, the first initializer will be used for entities and the second for relations.

    If a string is passed, then the default parameters will be used. Choose between “random_normal”, “random_uniform”, “glorot_normal”, “he_normal”, etc.

    See tf.keras.initializers for up-to-date details.

    model.compile(loss='pairwise', optim='adam',
                  entity_relation_initializer='random_normal')
    

    If the user wants to use custom hyperparameters, then an instance of the tf.keras.initializers.Initializer needs to be passed.

    import tensorflow as tf
    init = tf.keras.initializers.RandomNormal(stddev=0.00003)
    model.compile(loss='pairwise', optim='adam',
                  entity_relation_initializer=init)
    

    If the user wants to define custom initializer it can be any callable with the signature init = fn(shape)

    def my_init(shape):
        return tf.random.normal(shape)
    model.compile(loss='pairwise', optim='adam',
                  entity_relation_initializer=my_init)
    

  • entity_relation_regularizer (str (name of regularizer function) or regularizer function or tf.keras.regularizers.Regularizer instance or list) –

    Regularizer of entities and relations. If a single value is passed, then both the entities and relations will be regularized based on the same regularizer; if a list, the first regularizer will be used for entities and second for relations.

    If a string is passed, then the default parameters of the regularizers will be used. Choose between “l1”, “l2”, “l1_l2”, etc.

    See tf.keras.regularizers for up-to-date details.

    model.compile(loss='pairwise', optim='adam',
                  entity_relation_regularizer='l2')
    

    If the user wants to use custom hyperparameters, then an instance of the tf.keras.regularizers.Regularizer needs to be passed.

    import tensorflow as tf
    reg = tf.keras.regularizers.L1L2(l1=0.001, l2=0.1)
    model.compile(loss='pairwise', optim='adam',
                  entity_relation_regularizer=reg)
    

    If the user wants to define custom regularizer it can be any callable with signature reg = fn(weight_matrix).

    def my_reg(weight_mx):
          return 0.01 * tf.math.reduce_sum(tf.math.abs(weight_mx))
    model.compile(loss='pairwise', optim='adam',
                  entity_relation_regularizer=my_reg)
    

Example

>>> from ampligraph.datasets import load_fb15k_237
>>> from ampligraph.latent_features import ScoringBasedEmbeddingModel
>>> X = load_fb15k_237()
>>> model = ScoringBasedEmbeddingModel(eta=5,
>>>                                    k=300,
>>>                                    scoring_type='ComplEx',
>>>                                    seed=0)
>>> model.compile(optimizer='adam', loss='nll')
>>> model.fit(X['train'],
>>>           batch_size=10000,
>>>           epochs=5)
Epoch 1/5
29/29 [==============================] - 2s 61ms/step - loss: 67361.3047
Epoch 2/5
29/29 [==============================] - 1s 35ms/step - loss: 67318.6094
Epoch 3/5
29/29 [==============================] - 1s 34ms/step - loss: 67020.0703
Epoch 4/5
29/29 [==============================] - 1s 34ms/step - loss: 65867.3750
Epoch 5/5
29/29 [==============================] - 1s 34ms/step - loss: 63517.9062
compute_focusE_weights(weights, structure_weight)

Compute positive and negative weights to scale scores if use_focusE=True.

Parameters:
  • weights (array-like, shape (n, m)) – Batch of weights associated triples.

  • strucuture_weight (float) – Structural influence assigned to the weights.

Returns:

out – Tuple where the first elements is a tensor containing the positive weights and the second is a tensor containing the negative weights.

Return type:

tuple of two tf.Tensors, (tf.Tensor(shape=(n, 1)), tf.Tensor(shape=(n * self.eta, 1)))

compute_output_shape(inputShape)

Returns the output shape of the outputs of the call function.

Parameters:

input_shape (tuple) – Shape of inputs of call function.

Returns:

output_shape – List with the shape of outputs of call function for the input triples and the corruption scores.

Return type:

list of tuples

evaluate(x=None, batch_size=32, verbose=True, use_filter=False, corrupt_side='s,o', entities_subset=None, ranking_strategy='worst', callbacks=None, dataset_type='test')

Evaluate the inputs against corruptions and return ranks.

Parameters:
  • x (np.array, shape (n,3) or str or GraphDataLoader or AbstractGraphPartitioner) – Data OR Filename of the data file OR Data Handle to be used for training.

  • batch_size (int) – Batch size to use during training. May be overridden if x is GraphDataLoader or AbstractGraphPartitioner instance

  • verbose (bool) – Verbosity mode.

  • use_filter (bool or dict) – Whether to use a filter of not. If a dictionary is specified, the data in the dict is concatenated and used as filter.

  • corrupt_side (str) – Which side to corrupt of a triple to corrupt. It can be the subject (corrupt_size="s"), the object (corrupt_size="o"), the subject and the object (corrupt_size="s+o" or corrupt_size="s,o") (default:”s,o”).

  • ranking_strategy (str) – Indicates how to break ties when a test triple gets the same rank of a corruption. Can be one of the three types: “best”, “middle”, “worst” (default: “worst”, i.e., the worst rank is assigned to the test triple).

  • entities_subset (list or np.array) – Subset of entities to be used for generating corruptions.

  • callbacks (list of keras.callbacks.Callback instances) – List of callbacks to apply during evaluation.

Returns:

rank – Ranking of test triples against subject corruptions and/or object corruptions.

Return type:

np.array, shape (n, number of corrupted sides)

Example

>>> from ampligraph.datasets import load_fb15k_237
>>> from ampligraph.latent_features import ScoringBasedEmbeddingModel
>>> from ampligraph.evaluation.metrics import mrr_score, hits_at_n_score, mr_score
>>> X = load_fb15k_237()
>>> model = ScoringBasedEmbeddingModel(eta=5,
>>>                                    k=300,
>>>                                    scoring_type='ComplEx',
>>>                                    seed=0)
>>> model.compile(optimizer='adam', loss='nll')
>>> model.fit(X['train'],
>>>           batch_size=10000,
>>>           epochs=5)
Epoch 1/5
29/29 [==============================] - 2s 71ms/step - loss: 67361.3047
Epoch 2/5
29/29 [==============================] - 1s 35ms/step - loss: 67318.6094
Epoch 3/5
29/29 [==============================] - 1s 35ms/step - loss: 67020.0703
Epoch 4/5
29/29 [==============================] - 1s 33ms/step - loss: 65867.3750
Epoch 5/5
29/29 [==============================] - 1s 34ms/step - loss: 63517.9062
>>> ranks = model.evaluate(X['test'],
>>>                        batch_size=100,
>>>                        corrupt_side='s,o',
>>>                        use_filter={'train': X['train'],
>>>                                    'valid': X['valid'],
>>>                                    'test': X['test'])
>>> mr_score(ranks), mrr_score(ranks), hits_at_n_score(ranks, 1), hits_at_n_score(ranks, 10), len(ranks)
28 triples containing invalid keys skipped!
9 triples containing invalid keys skipped!
2045/2045 [==============================] - 149s 73ms/step
(428.44671689989235,
 0.25761041025282316,
 0.1898179861043155,
 0.391965945787259,
 20438)
fit(x=None, batch_size=1, epochs=1, verbose=True, callbacks=None, validation_split=0.0, validation_data=None, shuffle=True, initial_epoch=0, validation_batch_size=100, validation_corrupt_side='s,o', validation_freq=50, validation_burn_in=100, validation_filter=False, validation_entities_subset=None, partitioning_k=1, focusE=False, focusE_params={})

Fit the model on the provided data.

Parameters:
  • x (np.array, shape (n, 3), or str or GraphDataLoader or AbstractGraphPartitioner) – Data OR Filename of the data file OR Data Handle to be used for training.

  • batch_size (int) – Batch size to use during training. May be overridden if x is a GraphDataLoader or AbstractGraphPartitioner instance.

  • epochs (int) – Number of epochs to train (default: 1).

  • verbose (bool) – Verbosity (default: True).

  • callbacks (list of tf.keras.callbacks.Callback) – List of callbacks to be used during training (default: None).

  • validation_split (float) – Validation split to carve out of x (default: 0.0) (currently supported only when x is a np.array).

  • validation_data (np.array, shape (n, 3) or str or GraphDataLoader or AbstractGraphPartitioner) – Data OR Filename of the data file OR Data Handle to be used for validation.

  • shuffle (bool) – Indicates whether to shuffle the data after every epoch during training (default: True).

  • epoch (initial) – Initial epoch number (default: 1).

  • validation_batch_size (int) – Batch size to use during validation (default: 100). May be overridden if validation_data is GraphDataLoader or AbstractGraphPartitioner instance.

  • validation_freq (int) – Indicates how often to validate (default: 50).

  • validation_burn_in (int) – The burn-in time after which the validation kicks in.

  • validation_filter (bool or dict) – Validation filter to be used.

  • validation_entities_subset (list or np.array) –

    Subset of entities to be used for generating corruptions.

    Note

    One can perform early stopping using the tensorflow callback tf.keras.callbacks.EarlyStopping as shown in the accompanying example below.

  • focusE (bool) –

    Specify whether to include the FocusE layer (default: False). The FocusE layer [PC21] allows to inject numeric edge attributes into the scoring layer of a traditional knowledge graph embedding architecture. Semantically, the numeric value can signify importance, uncertainity, significance, confidence… of a triple.

    Note

    In order to activate focusE, the training data must have shape (n, 4), where the first three columns store subject, predicate and object of triples, and the 4-th column stores the numerical edge value associated with each triple.

  • focusE_params (dict) –

    If FocusE layer is included, specify its hyper-parameters. The following hyper-params can be passed:

    • ”non_linearity”: can be one of the following values “linear”, “softplus”, “sigmoid”, “tanh”.

    • ”stop_epoch”: specifies how long to decay (linearly) the numeric values from 1 to original value.

    • ”structural_wt”: structural influence hyperparameter \(\in [0, 1]\) that modulates the influence of graph topology.

    If focusE==True and focusE_params==dict(), then the default values are passed: non_linearity="linear", stop_epoch=251 and structural_wt=0.001.

  • partitioning_k (int) –

    Num of partitions to use while training (default: 1, i.e., the data is not partitioned). May be overridden if x is an AbstractGraphPartitioner instance.

    Note

    This function is quite useful when the size of your dataset is extremely large and cannot fit in memory. Setting this to a number strictly larger than 1 will automatically partition the data using BucketGraphPartitioner. Kindly checkout the tutorials for usage in Advanced mode.

Returns:

history – Its History.history attribute is a record of training loss values, as well as validation loss and validation metrics values.

Return type:

History object

Example

>>> from ampligraph.datasets import load_fb15k_237
>>> from ampligraph.latent_features import ScoringBasedEmbeddingModel
>>> X = load_fb15k_237()
>>> model = ScoringBasedEmbeddingModel(eta=5,
>>>                                    k=300,
>>>                                    scoring_type='ComplEx',
>>>                                    seed=0)
>>> model.compile(optimizer='adam', loss='nll')
>>> model.fit(X['train'],
>>>           batch_size=10000,
>>>           epochs=5)
Epoch 1/5
29/29 [==============================] - 2s 71ms/step - loss: 67361.3047
Epoch 2/5
29/29 [==============================] - 1s 35ms/step - loss: 67318.6094
Epoch 3/5
29/29 [==============================] - 1s 37ms/step - loss: 67020.0703
Epoch 4/5
29/29 [==============================] - 1s 35ms/step - loss: 65867.3750
Epoch 5/5
29/29 [==============================] - 1s 35ms/step - loss: 63517.9062
>>> # Early stopping example
>>> from ampligraph.latent_features import ScoringBasedEmbeddingModel
>>> from ampligraph.datasets import load_fb15k_237
>>> dataset = load_fb15k_237()
>>> model = ScoringBasedEmbeddingModel(eta=1,
>>>                                    k=10,
>>>                                    scoring_type='TransE')
>>> model.compile(optimizer='adam', loss='multiclass_nll')
>>> import tensorflow as tf
>>> early_stop = tf.keras.callbacks.EarlyStopping(monitor="val_mrr",            # which metrics to monitor
>>>                                               patience=3,                   # If the monitored metric doesnt improve for these many checks the model early stops
>>>                                               verbose=1,                    # verbosity
>>>                                               mode="max",                   # how to compare the monitored metrics; "max" means higher is better
>>>                                               restore_best_weights=True)    # restore the weights with best value
>>> # the early stopping instance needs to be passed as callback to fit function
>>> model.fit(dataset['train'],
>>>           batch_size=10000,
>>>           epochs=5,
>>>           validation_freq=1,                       # validation frequency
>>>           validation_batch_size=100,               # validation batch size
>>>           validation_burn_in=3,                    # burn in time
>>>           validation_corrupt_side='s,o',           # which side to corrupt
>>>           validation_data=dataset['valid'][::100], # Validation data
>>>           callbacks=[early_stop])                  # Pass the early stopping object as a callback
Epoch 1/5
29/29 [==============================] - 2s 82ms/step - loss: 6698.2188
Epoch 2/5
29/29 [==============================] - 1s 34ms/step - loss: 6648.8862
Epoch 3/5
3/3 [==============================] - 1s 446ms/steposs: 6652.895
29/29 [==============================] - 2s 84ms/step - loss: 6590.2842 - val_mrr: 0.0811 -
val_mr: 1776.4545 - val_hits@1: 0.0000e+00 - val_hits@10: 0.2301 - val_hits@100: 0.4148
Epoch 4/5
3/3 [==============================] - 0s 102ms/steposs: 6564.021
29/29 [==============================] - 1s 47ms/step - loss: 6517.4517 - val_mrr: 0.0918 -
val_mr: 1316.6335 - val_hits@1: 0.0000e+00 - val_hits@10: 0.2528 - val_hits@100: 0.4716
Epoch 5/5
3/3 [==============================] - 1s 177ms/steposs: 6468.798
29/29 [==============================] - 2s 62ms/step - loss: 6431.8696 - val_mrr: 0.0901 -
val_mr: 1074.8920 - val_hits@1: 0.0000e+00 - val_hits@10: 0.2386 - val_hits@100: 0.4773
classmethod from_config(config)

Creates a layer from its config.

This method is the reverse of get_config, capable of instantiating the same layer from the config dictionary. It does not handle layer connectivity (handled by Network), nor weights (handled by set_weights).

Parameters:

config – A Python dictionary, typically the output of get_config.

Returns:

A layer instance.

get_config()

Get the configuration hyper-parameters of the scoring based embedding model.

get_count(concept_type='e')

Returns the count of entities and relations that were present during training.

Parameters:

concept_type (str) – Indicates whether to count entities (concept_type='e') or relations (concept_type='r') (default: ‘e’).

Returns:

count – Count of the entities or relations.

Return type:

int

Example

>>> from ampligraph.datasets import load_fb15k_237
>>> from ampligraph.latent_features import ScoringBasedEmbeddingModel
>>> X = load_fb15k_237()
>>> model = ScoringBasedEmbeddingModel(eta=5,
>>>                                    k=300,
>>>                                    scoring_type='ComplEx',
>>>                                    seed=0)
>>> model.compile(optimizer='adam', loss='nll')
>>> model.fit(X['train'],
>>>           batch_size=10000,
>>>           epochs=5,
>>>           verbose=False)
>>> print('Entities:', model.get_count('e'))
>>> print('Relations:', model.get_count('r'))
Entities: 14505
Relations: 237
get_emb_matrix_test(part_number=1, number_of_parts=1)

Get the embedding matrix during evaluation.

Parameters:
  • number (part) – Specifies which part to return from the number_of_parts in which the entire embedding matrix is split.

  • number_of_parts (int) – Total number of parts in which to split the embedding matrix.

Returns:

  • emb_matrix (np.array, shape (n,k)) – Part of the embedding matrix corresponding to part_number.

  • start_index (int) – Original entity index (data dict) of the first row of the emb_matrix.

  • end_index (int) – Original entity index (data dict) of the last row of the emb_matrix.

get_embeddings(entities, embedding_type='e')

Get the embeddings of entities or relations.

Note

Use ampligraph.utils.create_tensorboard_visualizations() to visualize the embeddings with TensorBoard.

Parameters:
  • entities (array-like, shape=(n)) – The entities (or relations) of interest. Element of the vector must be the original string literals, and not internal IDs.

  • embedding_type (str) – If ‘e’ is passed, entities argument will be considered as a list of knowledge graph entities (i.e., nodes). If set to ‘r’, entities will be treated as relations instead.

Returns:

embeddings – An array of k-dimensional embeddings.

Return type:

ndarray, shape (n, k)

Example

>>> from ampligraph.latent_features import ScoringBasedEmbeddingModel
>>> from ampligraph.datasets import load_fb15k_237
>>> X = load_fb15k_237()
>>> model = ScoringBasedEmbeddingModel(eta=5,
>>>                                    k=300,
>>>                                    scoring_type='ComplEx',
>>>                                    seed=0)
>>> model.compile(optimizer='adam', loss='nll')
>>> model.fit(X['train'],
>>>           batch_size=10000,
>>>           epochs=5,
>>>           verbose=False)
>>> model.get_embeddings(['/m/027rn', '/m/06v8s0'], 'e')
array([[ 0.04482496  0.11973907  0.01117733 ... -0.13391922  0.11103553  -0.08132861]
 [-0.10158381  0.08108605 -0.07608676 ...  0.0591407  0.02791426  0.07559016]], dtype=float32)
get_focusE_params(dict_params={})

Get parameters for focusE.

Parameters:

dict_params (dict) –

The following hyper-params can be passed:

  • ”non_linearity”: can assume of the following values “linear”, “softplus”, “sigmoid”, “tanh”.

  • ”stop_epoch”: specifies how long to decay (linearly) the structural influence hyper-parameter from 1 until it reaches its original value.

  • ”structural_wt”: structural influence hyperparameter [0, 1] that modulates the influence of graph topology.

If the respective key is missing: non_linearity="linear", stop_epoch=251 and structural_wt=0.001.

Returns:

focusE_params – A tuple containing three values: the non-linearity function (str), the stop_epoch (int) and the structure weight (float).

Return type:

tuple

get_indexes(X, type_of='t', order='raw2ind')

Converts given data to indexes or to raw data (according to order).

It works for X containing triples, entities, or relations.

Parameters:
  • X (np.array or list) – Data to be indexed.

  • type_of (str) – Specifies whether to get indexes/raw data for triples (type_of='t'), entities (type_of='e'), or relations (type_of='r').

  • order (str) – Specifies whether to get indexes from raw data (order='raw2ind') or raw data from indexes (order='ind2raw').

Returns:

Y – Indexed data or raw data.

Return type:

np.array

Example

>>> from ampligraph.latent_features import ScoringBasedEmbeddingModel
>>> from ampligraph.datasets import load_fb15k_237
>>> X = load_fb15k_237()
>>> model = ScoringBasedEmbeddingModel(eta=5,
>>>                                    k=300,
>>>                                    scoring_type='ComplEx',
>>>                                    seed=0)
>>> model.compile(optimizer='adam', loss='nll')
>>> model.fit(X['train'],
>>>           batch_size=10000,
>>>           epochs=5,
>>>           verbose=False)
>>> print(model.get_indexes(['/m/027rn', '/m/06v8s0'], 'e', 'raw2ind'))
>>> print(model.get_indexes([3877, 0], 'e', 'ind2raw'))
[0, 3877]
['/m/06v8s0', '/m/027rn']
get_train_embedding_matrix_size()

Returns the size of the embedding matrix used for training.

This may not be same as (n, k) during partitioned training (where n is the number of triples in the whole training set).

is_fit()

Check whether the model has been fitted already.

load_metadata(filepath=None, filedir=None)
load_weights(filepath)

Loads the model weights.

Use this function if save_weights was used to save the model.

Note

If you want to continue training, you can use the ampligraph.utils.save_model() and ampligraph.utils.load_model(). These functions save the entire state of the graph which allows to continue the training from where it stopped.

Parameters:

filepath (str) – Path to save the model.

make_calibrate_function()

Similar to keras lib, this function returns the handle to the calibrate step function.

It processes one batch of data by iterating over the dataset iterator and computes the calibration of predictions.

Returns:

out – Handle to the calibration function.

Return type:

Function handle

make_predict_function()

Similar to keras lib, this function returns the handle to the predict step function.

It processes one batch of data by iterating over the dataset iterator and computes the prediction outputs.

Returns:

out – Handle to the predict function.

Return type:

Function handle

make_test_function()

Similar to keras lib, this function returns the handle to test step function.

It processes one batch of data by iterating over the dataset iterator and computes the test metrics.

Returns:

out – Handle to the test step function.

Return type:

Function handle

make_train_function()

Similar to keras lib, this function returns the handle to the training step function. It processes one batch of data by iterating over the dataset iterator, it computes the loss and optimizes on it.

Returns:

out – Handle to the training step function.

Return type:

Function handle

partition_change_updates(num_ents, ent_emb, rel_emb)

Perform the changes that are required when the partition is modified during training.

Parameters:
  • num_ents (int) – Number of unique entities in the partition.

  • ent_emb (array-like) – Entity embeddings that need to be trained for the partition (all triples of the partition will have embeddings in this matrix).

  • rel_emb (array-like) – relation embeddings that need to be trained for the partition (all triples of the partition will have embeddings in this matrix).

predict(x, batch_size=32, verbose=0, callbacks=None)

Compute scores of the input triples.

Parameters:
  • x (np.array, shape (n, 3) or str or GraphDataLoader or AbstractGraphPartitioner) – Data OR Filename of the data file OR Data Handle to be used for training.

  • batch_size (int) – Batch size to use during training. May be overridden if x is GraphDataLoader or AbstractGraphPartitioner instance

  • verbose (bool) – Verbosity mode.

  • callbacks (list of keras.callbacks.Callback instances) – List of callbacks to apply during evaluation.

Returns:

scores – Score of the input triples.

Return type:

np.array, shape (n, )

Example

>>> from ampligraph.latent_features import ScoringBasedEmbeddingModel
>>> import numpy as np
>>> from ampligraph.datasets import load_fb15k_237
>>> X = load_fb15k_237()
>>> model = ScoringBasedEmbeddingModel(eta=5,
>>>                                    k=300,
>>>                                    scoring_type='ComplEx',
>>>                                    seed=0)
>>> model.compile(optimizer='adam', loss='nll')
>>> model.fit(X['train'],
>>>           batch_size=10000,
>>>           epochs=5)
Epoch 1/5
29/29 [==============================] - 7s 228ms/step - loss: 67361.2734
Epoch 2/5
29/29 [==============================] - 5s 184ms/step - loss: 67318.8203
Epoch 3/5
29/29 [==============================] - 5s 187ms/step - loss: 67021.1641
Epoch 4/5
29/29 [==============================] - 5s 188ms/step - loss: 65865.5547
Epoch 5/5
29/29 [==============================] - 5s 188ms/step - loss: 63510.2773
>>> pred = model.predict(X['test'],
>>>                      batch_size=100)
>>> print(np.sort(pred))
[-1.0868168  -0.46582496 -0.44715863 ...  3.2484274   3.3147712  3.326     ]
predict_proba(x, batch_size=32, verbose=0, callbacks=None)

Compute calibrated scores (\(0 ≤ score ≤ 1\)) for the input triples.

Parameters:
  • x (np.array, shape (n,3) or str or GraphDataLoader or AbstractGraphPartitioner) – Data OR Filename of the data file OR Data Handle to be used for training.

  • batch_size (int) – Batch size to use during training. May be overridden if x is GraphDataLoader or AbstractGraphPartitioner instance.

  • verbose (bool) – Verbosity mode (default: False).

  • callbacks (list of keras.callbacks.Callback instances) – List of callbacks to apply during evaluation.

Returns:

scores – Calibrated scores for the input triples.

Return type:

np.array, shape (n, )

Example

>>> from ampligraph.datasets import load_fb15k_237
>>> from ampligraph.latent_features import ScoringBasedEmbeddingModel
>>> import numpy as np
>>> dataset = load_fb15k_237()
>>> model = ScoringBasedEmbeddingModel(eta=5,
>>>                                    k=300,
>>>                                    scoring_type='ComplEx')
>>> model.compile(optimizer='adam', loss='nll')
>>> model.fit(dataset['train'],
>>>           batch_size=10000,
>>>           epochs=5)
>>> print('Raw scores (sorted):', np.sort(model.predict(dataset['test'])))
>>> print('Indices obtained by sorting (scores):', np.argsort(model.predict(dataset['test'])))
Raw scores (sorted): [-1.0384613  -0.46752608 -0.45149875 ...  3.2897844  3.3034315  3.3280635 ]
Indices obtained by sorting (scores): [ 3834 18634  4066 ...  1355 13633 10961]
>>> model.calibrate(dataset['test'],
>>>                 batch_size=10000,
>>>                 positive_base_rate=0.9,
>>>                 epochs=100)
>>> print('Calibrated scores (sorted):', np.sort(model.predict_proba(dataset['test'])))
>>> print('Indices obtained by sorting (Calibrated):', np.argsort(model.predict_proba(dataset['test'])))
Calibrated scores (sorted): [0.5553725  0.5556108  0.5568415  ... 0.6211011  0.62382233 0.6297585 ]
Indices obtained by sorting (Calibrated): [14573 11577  4404 ... 17817 17816   733]
predict_step(inputs)

Returns the output of predict step on a batch of data.

predict_step_partitioning(inputs)

Returns the output of predict step on a batch of data.

process_model_inputs_for_test(triples)

Return the processed triples.

Parameters:

triples (np.array) – Triples to be processed.

Returns:

out_triples – In regular (non partitioned) mode, the triples are returned as they are given in input. In case of partitioning, it returns the triple embeddings as a list of size 3, where each element is a np.array of subjects, predicates and objects embeddings.

Return type:

np.array or list

save(filepath, overwrite=True, include_optimizer=True, save_format=None, signatures=None, options=None, save_traces=True)

Save the model.

save_metadata(filepath=None, filedir=None)

Save metadata.

save_weights(filepath, overwrite=True)

Save the trainable weights.

Use this function if the training process is complete and you want to use the model only for inference. Use load_weights() to load the model weights back.

Note

If you want to be able of continuing the training, you can use the ampligraph.utils.save_model() and ampligraph.utils.restore_model().These functions save and restore the entire state of the graph, which allows to continue the training from where it was stopped.

Parameters:
  • filepath (str) – Path to save the model.

  • overwrite (bool) – Flag which indicates whether the model, if present, needs to be overwritten or not (default: True).

train_step(data)

Training step.

Parameters:

data (array-like, shape (n, m)) – Batch of input triples (true positives) with weights associated if m>3.

Returns:

out – Dictionary of metrics computed on the outputs (e.g., loss).

Return type:

dict

update_focusE_params()

Update the structural weight after decay.