ScoringBasedEmbeddingModel¶
- class ampligraph.latent_features.ScoringBasedEmbeddingModel(*args, **kwargs)¶
Class for handling KGE models which follows the ranking based protocol.
Example
>>> # create model and compile using user defined optimizer settings and >>> # user defined settings of an existing loss >>> from ampligraph.datasets import load_fb15k_237 >>> from ampligraph.latent_features import ScoringBasedEmbeddingModel >>> from ampligraph.latent_features.loss_functions import SelfAdversarialLoss >>> import tensorflow as tf >>> X = load_fb15k_237() >>> loss = SelfAdversarialLoss({'margin': 0.1, 'alpha': 5, 'reduction': 'sum'}) >>> model = ScoringBasedEmbeddingModel(eta=5, >>> k=300, >>> scoring_type='ComplEx', >>> seed=0) >>> model.compile(optimizer='adam', loss=loss) >>> model.fit(X['train'], >>> batch_size=10000, >>> epochs=5) Epoch 1/5 29/29 [==============================] - 3s 87ms/step - loss: 13496.5752 Epoch 2/5 29/29 [==============================] - 1s 36ms/step - loss: 13488.8682 Epoch 3/5 29/29 [==============================] - 1s 35ms/step - loss: 13436.2725 Epoch 4/5 29/29 [==============================] - 1s 35ms/step - loss: 13259.0840 Epoch 5/5 29/29 [==============================] - 1s 34ms/step - loss: 12977.0117
Attributes
metricsReturns all the metrics that will be computed during training.
Methods
__init__(eta, k[, scoring_type, seed, ...])Initializes the scoring based embedding model using the user specified scoring function.
build(input_shape)Override the build function of the Model class.
build_full_model([batch_size])This method is called while loading the weights to build the model.
calibrate(X_pos[, X_neg, ...])Calibrate predictions.
call(inputs[, training])Computes the scores of the triples and returns the corruption scores as well.
compile([optimizer, loss, ...])Compile the model.
compute_focusE_weights(weights, structure_weight)Compute positive and negative weights to scale scores if
use_focusE=True.compute_output_shape(inputShape)Returns the output shape of the outputs of the call function.
evaluate([x, batch_size, verbose, ...])Evaluate the inputs against corruptions and return ranks.
fit([x, batch_size, epochs, verbose, ...])Fit the model on the provided data.
from_config(config)Creates a layer from its config.
Get the configuration hyper-parameters of the scoring based embedding model.
get_count([concept_type])Returns the count of entities and relations that were present during training.
get_emb_matrix_test([part_number, ...])Get the embedding matrix during evaluation.
get_embeddings(entities[, embedding_type])Get the embeddings of entities or relations.
get_focusE_params([dict_params])Get parameters for focusE.
get_indexes(X[, type_of, order])Converts given data to indexes or to raw data (according to
order).Returns the size of the embedding matrix used for training.
is_fit()Check whether the model has been fitted already.
load_metadata([filepath, filedir])load_weights(filepath)Loads the model weights.
Similar to keras lib, this function returns the handle to the calibrate step function.
Similar to keras lib, this function returns the handle to the predict step function.
Similar to keras lib, this function returns the handle to test step function.
Similar to keras lib, this function returns the handle to the training step function.
partition_change_updates(num_ents, ent_emb, ...)Perform the changes that are required when the partition is modified during training.
predict(x[, batch_size, verbose, callbacks])Compute scores of the input triples.
predict_proba(x[, batch_size, verbose, ...])Compute calibrated scores (\(0 ≤ score ≤ 1\)) for the input triples.
predict_step(inputs)Returns the output of predict step on a batch of data.
predict_step_partitioning(inputs)Returns the output of predict step on a batch of data.
process_model_inputs_for_test(triples)Return the processed triples.
save(filepath[, overwrite, ...])Save the model.
save_metadata([filepath, filedir])Save metadata.
save_weights(filepath[, overwrite])Save the trainable weights.
train_step(data)Training step.
Update the structural weight after decay.
- __init__(eta, k, scoring_type='DistMult', seed=0, max_ent_size=None, max_rel_size=None)¶
Initializes the scoring based embedding model using the user specified scoring function.
- Parameters:
eta (int) – Num of negatives to use during training per triple.
k (int) – Embedding size.
scoring_type (str) –
Name of the scoring layer to use.
TransETranslating embedding scoring function will be usedDistMultDistMult embedding scoring function will be usedComplExComplEx embedding scoring function will be usedHolEHolograph embedding scoring function will be used
seed (int) – Random seed.
max_ent_size (int) – Maximum number of entities that can occur in any partition (default: None).
max_rel_size (int) – Maximum number of relations that can occur in any partition (default: None).
- build(input_shape)¶
Override the build function of the Model class.
It is called on the first call to
__call__. With this function we set some internal parameters of the encoding layers (needed to build that layers themselves) based on the input data supplied by the user while calling the ~ScoringBasedEmbeddingModel.fit method.
- build_full_model(batch_size=100)¶
This method is called while loading the weights to build the model.
- calibrate(X_pos, X_neg=None, positive_base_rate=None, batch_size=32, epochs=50, verbose=0)¶
Calibrate predictions.
The method implements the heuristics described in [TC20], using Platt scaling [P+99].
The calibrated predictions can be obtained with
predict_proba()after calibration is done.Ideally, calibration should be performed on a validation set that was not used to train the embeddings.
There are two modes of operation, depending on the availability of negative triples:
Both positive and negative triples are provided via
X_posandX_negrespectively. The optimization is done using a second-order method (limited-memory BFGS), therefore no hyperparameter needs to be specified.Only positive triples are provided, and the negative triples are generated by corruptions, just like it is done in training or evaluation. The optimization is done using a first-order method (ADAM), therefore
batches_countandepochsmust be specified.
Calibration is highly dependent on the base rate of positive triples. Therefore, for mode (2) of operation, the user is required to provide the
positive_base_rateargument. For mode (1), that can be inferred automatically by the relative sizes of the positive and negative sets, but the user can override this behaviour by providing a value topositive_base_rate.Defining the positive base rate is the biggest challenge when calibrating without negatives. That depends on the user choice of triples to be evaluated during test time. Let’s take the WN11 dataset as an example: it has around 50% positives triples on both the validation set and test set, so the positive base rate follows to be 50%. However, should the user resample it to have 75% positives and 25% negatives, the previous calibration would be degraded. The user must recalibrate the model with a 75% positive base rate. Therefore, this parameter depends on how the user handles the dataset and cannot be determined automatically or a priori.
- Parameters:
X_pos (np.array, shape (n,3) or str or GraphDataLoader or AbstractGraphPartitioner) – Data OR Filename of the data file OR Data Handle to be used as positive triples.
X_neg (np.array, shape (n,3) or str or GraphDataLoader or AbstractGraphPartitioner) –
Data OR Filename of the data file OR Data Handle to be used as negative triples.
If None, the negative triples are generated via corruptions and the user must provide a positive base rate instead.
positive_base_rate (float) –
Base rate of positive statements.
For example, if we assume there is an even chance for any query to be true, the base rate would be 50%.
If
X_negis provided andpositive_base_rate=None, the relative sizes ofX_posandX_negwill be used to determine the base rate. Say we have 50 positive triples and 200 negative triples, the positive base rate will be assumed to be \(\frac{50}{(50+200)} = \frac{1}{5} = 0.2\).This value must be \(\in [0,1]\).
batches_size (int) – Batch size for positives.
epochs (int) – Number of epochs used to train the Platt scaling model. Only applies when
X_neg=None.verbose (bool) – Verbosity (default: False).
Example
>>> from ampligraph.datasets import load_fb15k_237 >>> from ampligraph.latent_features import ScoringBasedEmbeddingModel >>> import numpy as np >>> dataset = load_fb15k_237() >>> model = ScoringBasedEmbeddingModel(eta=5, >>> k=300, >>> scoring_type='ComplEx') >>> model.compile(optimizer='adam', loss='nll') >>> model.fit(dataset['train'], >>> batch_size=10000, >>> epochs=5) >>> print('Raw scores (sorted):', np.sort(model.predict(dataset['test']))) >>> print('Indices obtained by sorting (scores):', np.argsort(model.predict(dataset['test']))) Raw scores (sorted): [-1.0689778 -0.42082012 -0.39887887 ... 3.261838 3.2755773 3.2768354 ] Indices obtained by sorting (scores): [ 3834 18634 4066 ... 6237 13633 10961] >>> model.calibrate(dataset['test'], >>> batch_size=10000, >>> positive_base_rate=0.9, >>> epochs=100) >>> print('Calibrated scores (sorted):', np.sort(model.predict_proba(dataset['test']))) >>> print('Indices obtained by sorting (Calibrated):', np.argsort(model.predict_proba(dataset['test']))) Calibrated scores (sorted): [0.49547982 0.5396996 0.54118955 ... 0.7624245 0.7631044 0.76316655] Indices obtained by sorting (Calibrated): [ 3834 18634 4066 ... 6237 13633 10961]
- call(inputs, training=False)¶
Computes the scores of the triples and returns the corruption scores as well.
- Parameters:
inputs (ndarray, shape (n, 3)) – Batch of input triples.
- Returns:
out – List of input scores along with their corruptions.
- Return type:
list
- compile(optimizer='adam', loss=None, entity_relation_initializer='glorot_uniform', entity_relation_regularizer=None, **kwargs)¶
Compile the model.
- Parameters:
optimizer (str (name of optimizer) or optimizer instance) –
The optimizer used to minimize the loss function. For pre-defined options, choose between “sgd”, “adagrad”, “adam”, “rmsprop”, etc. See tf.keras.optimizers for up-to-date details.
If a string is passed, then the default parameters of the optimizer will be used.
If you want to use custom hyperparameters you need to create an instance of the optimizer and pass the instance to the compile function
import tensorflow as tf adam_opt = tf.keras.optimizers.Adam(learning_rate=0.003) model.compile(loss='pairwise', optim=adam_opt)
loss (str (name of objective function), objective function or ampligraph.latent_features.loss_functions.Loss) –
If a string is passed, you can use one of the following losses which will be used with their default setting:
”pairwise”: the model will use the pairwise margin-based loss function.
”nll”: the model will use the negative loss likelihood.
”absolute_margin”: the model will use the absolute margin likelihood.
”self_adversarial”: the model will use the adversarial sampling loss function.
”multiclass_nll”: the model will use the multiclass nll loss.
model.compile(loss='absolute_margin', optim='adam')
If you want to modify the default parameters of the loss function, you need to explictly create an instance of the loss with required hyperparameters and then pass this instance.
from ampligraph.latent_features import AbsoluteMarginLoss ab_loss = AbsoluteMarginLoss(loss_params={'margin': 3}) model.compile(loss=ab_loss, optim='adam')
An objective function is any callable with the signature
loss = fn(score_true, score_corr, eta)# Create a user defined loss function with the above signature def userLoss(scores_pos, scores_neg): # user defined loss - takes in 2 params and returns loss neg_exp = tf.exp(scores_neg) pos_exp = tf.exp(scores_pos) # Apply softmax to the scores score = pos_exp / (tf.reduce_sum(neg_exp, axis=0) + pos_exp) loss = -tf.math.log(score) return loss # Pass this loss while compiling the model model.compile(loss=userLoss, optim='adam')
entity_relation_initializer (str (name of initializer function), initializer function or tf.keras.initializers.Initializer or list.) –
Initializer of the entity and relation embeddings. This is either a single value or a list of size 2. If a single value is passed, then both the entities and relations will be initialized based on the same initializer; if a list, the first initializer will be used for entities and the second for relations.
If a string is passed, then the default parameters will be used. Choose between “random_normal”, “random_uniform”, “glorot_normal”, “he_normal”, etc.
See tf.keras.initializers for up-to-date details.
model.compile(loss='pairwise', optim='adam', entity_relation_initializer='random_normal')
If the user wants to use custom hyperparameters, then an instance of the
tf.keras.initializers.Initializerneeds to be passed.import tensorflow as tf init = tf.keras.initializers.RandomNormal(stddev=0.00003) model.compile(loss='pairwise', optim='adam', entity_relation_initializer=init)
If the user wants to define custom initializer it can be any callable with the signature init = fn(shape)
def my_init(shape): return tf.random.normal(shape) model.compile(loss='pairwise', optim='adam', entity_relation_initializer=my_init)
entity_relation_regularizer (str (name of regularizer function) or regularizer function or tf.keras.regularizers.Regularizer instance or list) –
Regularizer of entities and relations. If a single value is passed, then both the entities and relations will be regularized based on the same regularizer; if a list, the first regularizer will be used for entities and second for relations.
If a string is passed, then the default parameters of the regularizers will be used. Choose between “l1”, “l2”, “l1_l2”, etc.
See tf.keras.regularizers for up-to-date details.
model.compile(loss='pairwise', optim='adam', entity_relation_regularizer='l2')
If the user wants to use custom hyperparameters, then an instance of the
tf.keras.regularizers.Regularizerneeds to be passed.import tensorflow as tf reg = tf.keras.regularizers.L1L2(l1=0.001, l2=0.1) model.compile(loss='pairwise', optim='adam', entity_relation_regularizer=reg)
If the user wants to define custom regularizer it can be any callable with signature
reg = fn(weight_matrix).def my_reg(weight_mx): return 0.01 * tf.math.reduce_sum(tf.math.abs(weight_mx)) model.compile(loss='pairwise', optim='adam', entity_relation_regularizer=my_reg)
Example
>>> from ampligraph.datasets import load_fb15k_237 >>> from ampligraph.latent_features import ScoringBasedEmbeddingModel >>> X = load_fb15k_237() >>> model = ScoringBasedEmbeddingModel(eta=5, >>> k=300, >>> scoring_type='ComplEx', >>> seed=0) >>> model.compile(optimizer='adam', loss='nll') >>> model.fit(X['train'], >>> batch_size=10000, >>> epochs=5) Epoch 1/5 29/29 [==============================] - 2s 61ms/step - loss: 67361.3047 Epoch 2/5 29/29 [==============================] - 1s 35ms/step - loss: 67318.6094 Epoch 3/5 29/29 [==============================] - 1s 34ms/step - loss: 67020.0703 Epoch 4/5 29/29 [==============================] - 1s 34ms/step - loss: 65867.3750 Epoch 5/5 29/29 [==============================] - 1s 34ms/step - loss: 63517.9062
- compute_focusE_weights(weights, structure_weight)¶
Compute positive and negative weights to scale scores if
use_focusE=True.- Parameters:
weights (array-like, shape (n, m)) – Batch of weights associated triples.
strucuture_weight (float) – Structural influence assigned to the weights.
- Returns:
out – Tuple where the first elements is a tensor containing the positive weights and the second is a tensor containing the negative weights.
- Return type:
tuple of two tf.Tensors, (tf.Tensor(shape=(n, 1)), tf.Tensor(shape=(n * self.eta, 1)))
- compute_output_shape(inputShape)¶
Returns the output shape of the outputs of the call function.
- Parameters:
input_shape (tuple) – Shape of inputs of call function.
- Returns:
output_shape – List with the shape of outputs of call function for the input triples and the corruption scores.
- Return type:
list of tuples
- evaluate(x=None, batch_size=32, verbose=True, use_filter=False, corrupt_side='s,o', entities_subset=None, ranking_strategy='worst', callbacks=None, dataset_type='test')¶
Evaluate the inputs against corruptions and return ranks.
- Parameters:
x (np.array, shape (n,3) or str or GraphDataLoader or AbstractGraphPartitioner) – Data OR Filename of the data file OR Data Handle to be used for training.
batch_size (int) – Batch size to use during training. May be overridden if
xis GraphDataLoader or AbstractGraphPartitioner instanceverbose (bool) – Verbosity mode.
use_filter (bool or dict) – Whether to use a filter of not. If a dictionary is specified, the data in the dict is concatenated and used as filter.
corrupt_side (str) – Which side to corrupt of a triple to corrupt. It can be the subject (
corrupt_size="s"), the object (corrupt_size="o"), the subject and the object (corrupt_size="s+o"orcorrupt_size="s,o") (default:”s,o”).ranking_strategy (str) – Indicates how to break ties when a test triple gets the same rank of a corruption. Can be one of the three types: “best”, “middle”, “worst” (default: “worst”, i.e., the worst rank is assigned to the test triple).
entities_subset (list or np.array) – Subset of entities to be used for generating corruptions.
callbacks (list of keras.callbacks.Callback instances) – List of callbacks to apply during evaluation.
- Returns:
rank – Ranking of test triples against subject corruptions and/or object corruptions.
- Return type:
np.array, shape (n, number of corrupted sides)
Example
>>> from ampligraph.datasets import load_fb15k_237 >>> from ampligraph.latent_features import ScoringBasedEmbeddingModel >>> from ampligraph.evaluation.metrics import mrr_score, hits_at_n_score, mr_score >>> X = load_fb15k_237() >>> model = ScoringBasedEmbeddingModel(eta=5, >>> k=300, >>> scoring_type='ComplEx', >>> seed=0) >>> model.compile(optimizer='adam', loss='nll') >>> model.fit(X['train'], >>> batch_size=10000, >>> epochs=5) Epoch 1/5 29/29 [==============================] - 2s 71ms/step - loss: 67361.3047 Epoch 2/5 29/29 [==============================] - 1s 35ms/step - loss: 67318.6094 Epoch 3/5 29/29 [==============================] - 1s 35ms/step - loss: 67020.0703 Epoch 4/5 29/29 [==============================] - 1s 33ms/step - loss: 65867.3750 Epoch 5/5 29/29 [==============================] - 1s 34ms/step - loss: 63517.9062 >>> ranks = model.evaluate(X['test'], >>> batch_size=100, >>> corrupt_side='s,o', >>> use_filter={'train': X['train'], >>> 'valid': X['valid'], >>> 'test': X['test']) >>> mr_score(ranks), mrr_score(ranks), hits_at_n_score(ranks, 1), hits_at_n_score(ranks, 10), len(ranks) 28 triples containing invalid keys skipped! 9 triples containing invalid keys skipped! 2045/2045 [==============================] - 149s 73ms/step (428.44671689989235, 0.25761041025282316, 0.1898179861043155, 0.391965945787259, 20438)
- fit(x=None, batch_size=1, epochs=1, verbose=True, callbacks=None, validation_split=0.0, validation_data=None, shuffle=True, initial_epoch=0, validation_batch_size=100, validation_corrupt_side='s,o', validation_freq=50, validation_burn_in=100, validation_filter=False, validation_entities_subset=None, partitioning_k=1, focusE=False, focusE_params={})¶
Fit the model on the provided data.
- Parameters:
x (np.array, shape (n, 3), or str or GraphDataLoader or AbstractGraphPartitioner) – Data OR Filename of the data file OR Data Handle to be used for training.
batch_size (int) – Batch size to use during training. May be overridden if x is a GraphDataLoader or AbstractGraphPartitioner instance.
epochs (int) – Number of epochs to train (default: 1).
verbose (bool) – Verbosity (default: True).
callbacks (list of tf.keras.callbacks.Callback) – List of callbacks to be used during training (default: None).
validation_split (float) – Validation split to carve out of x (default: 0.0) (currently supported only when x is a np.array).
validation_data (np.array, shape (n, 3) or str or GraphDataLoader or AbstractGraphPartitioner) – Data OR Filename of the data file OR Data Handle to be used for validation.
shuffle (bool) – Indicates whether to shuffle the data after every epoch during training (default: True).
epoch (initial) – Initial epoch number (default: 1).
validation_batch_size (int) – Batch size to use during validation (default: 100). May be overridden if
validation_datais GraphDataLoader or AbstractGraphPartitioner instance.validation_freq (int) – Indicates how often to validate (default: 50).
validation_burn_in (int) – The burn-in time after which the validation kicks in.
validation_filter (bool or dict) – Validation filter to be used.
validation_entities_subset (list or np.array) –
Subset of entities to be used for generating corruptions.
Note
One can perform early stopping using the tensorflow callback
tf.keras.callbacks.EarlyStoppingas shown in the accompanying example below.focusE (bool) –
Specify whether to include the FocusE layer (default: False). The FocusE layer [PC21] allows to inject numeric edge attributes into the scoring layer of a traditional knowledge graph embedding architecture. Semantically, the numeric value can signify importance, uncertainity, significance, confidence… of a triple.
Note
In order to activate focusE, the training data must have shape (n, 4), where the first three columns store subject, predicate and object of triples, and the 4-th column stores the numerical edge value associated with each triple.
focusE_params (dict) –
If FocusE layer is included, specify its hyper-parameters. The following hyper-params can be passed:
”non_linearity”: can be one of the following values “linear”, “softplus”, “sigmoid”, “tanh”.
”stop_epoch”: specifies how long to decay (linearly) the numeric values from 1 to original value.
”structural_wt”: structural influence hyperparameter \(\in [0, 1]\) that modulates the influence of graph topology.
If
focusE==TrueandfocusE_params==dict(), then the default values are passed:non_linearity="linear",stop_epoch=251andstructural_wt=0.001.partitioning_k (int) –
Num of partitions to use while training (default: 1, i.e., the data is not partitioned). May be overridden if
xis an AbstractGraphPartitioner instance.Note
This function is quite useful when the size of your dataset is extremely large and cannot fit in memory. Setting this to a number strictly larger than 1 will automatically partition the data using
BucketGraphPartitioner. Kindly checkout the tutorials for usage in Advanced mode.
- Returns:
history – Its History.history attribute is a record of training loss values, as well as validation loss and validation metrics values.
- Return type:
History object
Example
>>> from ampligraph.datasets import load_fb15k_237 >>> from ampligraph.latent_features import ScoringBasedEmbeddingModel >>> X = load_fb15k_237() >>> model = ScoringBasedEmbeddingModel(eta=5, >>> k=300, >>> scoring_type='ComplEx', >>> seed=0) >>> model.compile(optimizer='adam', loss='nll') >>> model.fit(X['train'], >>> batch_size=10000, >>> epochs=5) Epoch 1/5 29/29 [==============================] - 2s 71ms/step - loss: 67361.3047 Epoch 2/5 29/29 [==============================] - 1s 35ms/step - loss: 67318.6094 Epoch 3/5 29/29 [==============================] - 1s 37ms/step - loss: 67020.0703 Epoch 4/5 29/29 [==============================] - 1s 35ms/step - loss: 65867.3750 Epoch 5/5 29/29 [==============================] - 1s 35ms/step - loss: 63517.9062
>>> # Early stopping example >>> from ampligraph.latent_features import ScoringBasedEmbeddingModel >>> from ampligraph.datasets import load_fb15k_237 >>> dataset = load_fb15k_237() >>> model = ScoringBasedEmbeddingModel(eta=1, >>> k=10, >>> scoring_type='TransE') >>> model.compile(optimizer='adam', loss='multiclass_nll') >>> import tensorflow as tf >>> early_stop = tf.keras.callbacks.EarlyStopping(monitor="val_mrr", # which metrics to monitor >>> patience=3, # If the monitored metric doesnt improve for these many checks the model early stops >>> verbose=1, # verbosity >>> mode="max", # how to compare the monitored metrics; "max" means higher is better >>> restore_best_weights=True) # restore the weights with best value >>> # the early stopping instance needs to be passed as callback to fit function >>> model.fit(dataset['train'], >>> batch_size=10000, >>> epochs=5, >>> validation_freq=1, # validation frequency >>> validation_batch_size=100, # validation batch size >>> validation_burn_in=3, # burn in time >>> validation_corrupt_side='s,o', # which side to corrupt >>> validation_data=dataset['valid'][::100], # Validation data >>> callbacks=[early_stop]) # Pass the early stopping object as a callback Epoch 1/5 29/29 [==============================] - 2s 82ms/step - loss: 6698.2188 Epoch 2/5 29/29 [==============================] - 1s 34ms/step - loss: 6648.8862 Epoch 3/5 3/3 [==============================] - 1s 446ms/steposs: 6652.895 29/29 [==============================] - 2s 84ms/step - loss: 6590.2842 - val_mrr: 0.0811 - val_mr: 1776.4545 - val_hits@1: 0.0000e+00 - val_hits@10: 0.2301 - val_hits@100: 0.4148 Epoch 4/5 3/3 [==============================] - 0s 102ms/steposs: 6564.021 29/29 [==============================] - 1s 47ms/step - loss: 6517.4517 - val_mrr: 0.0918 - val_mr: 1316.6335 - val_hits@1: 0.0000e+00 - val_hits@10: 0.2528 - val_hits@100: 0.4716 Epoch 5/5 3/3 [==============================] - 1s 177ms/steposs: 6468.798 29/29 [==============================] - 2s 62ms/step - loss: 6431.8696 - val_mrr: 0.0901 - val_mr: 1074.8920 - val_hits@1: 0.0000e+00 - val_hits@10: 0.2386 - val_hits@100: 0.4773
- classmethod from_config(config)¶
Creates a layer from its config.
This method is the reverse of get_config, capable of instantiating the same layer from the config dictionary. It does not handle layer connectivity (handled by Network), nor weights (handled by set_weights).
- Parameters:
config – A Python dictionary, typically the output of get_config.
- Returns:
A layer instance.
- get_config()¶
Get the configuration hyper-parameters of the scoring based embedding model.
- get_count(concept_type='e')¶
Returns the count of entities and relations that were present during training.
- Parameters:
concept_type (str) – Indicates whether to count entities (
concept_type='e') or relations (concept_type='r') (default: ‘e’).- Returns:
count – Count of the entities or relations.
- Return type:
int
Example
>>> from ampligraph.datasets import load_fb15k_237 >>> from ampligraph.latent_features import ScoringBasedEmbeddingModel >>> X = load_fb15k_237() >>> model = ScoringBasedEmbeddingModel(eta=5, >>> k=300, >>> scoring_type='ComplEx', >>> seed=0) >>> model.compile(optimizer='adam', loss='nll') >>> model.fit(X['train'], >>> batch_size=10000, >>> epochs=5, >>> verbose=False) >>> print('Entities:', model.get_count('e')) >>> print('Relations:', model.get_count('r')) Entities: 14505 Relations: 237
- get_emb_matrix_test(part_number=1, number_of_parts=1)¶
Get the embedding matrix during evaluation.
- Parameters:
number (part) – Specifies which part to return from the
number_of_partsin which the entire embedding matrix is split.number_of_parts (int) – Total number of parts in which to split the embedding matrix.
- Returns:
emb_matrix (np.array, shape (n,k)) – Part of the embedding matrix corresponding to part_number.
start_index (int) – Original entity index (data dict) of the first row of the emb_matrix.
end_index (int) – Original entity index (data dict) of the last row of the emb_matrix.
- get_embeddings(entities, embedding_type='e')¶
Get the embeddings of entities or relations.
Note
Use
ampligraph.utils.create_tensorboard_visualizations()to visualize the embeddings with TensorBoard.- Parameters:
entities (array-like, shape=(n)) – The entities (or relations) of interest. Element of the vector must be the original string literals, and not internal IDs.
embedding_type (str) – If ‘e’ is passed,
entitiesargument will be considered as a list of knowledge graph entities (i.e., nodes). If set to ‘r’,entitieswill be treated as relations instead.
- Returns:
embeddings – An array of k-dimensional embeddings.
- Return type:
ndarray, shape (n, k)
Example
>>> from ampligraph.latent_features import ScoringBasedEmbeddingModel >>> from ampligraph.datasets import load_fb15k_237 >>> X = load_fb15k_237() >>> model = ScoringBasedEmbeddingModel(eta=5, >>> k=300, >>> scoring_type='ComplEx', >>> seed=0) >>> model.compile(optimizer='adam', loss='nll') >>> model.fit(X['train'], >>> batch_size=10000, >>> epochs=5, >>> verbose=False) >>> model.get_embeddings(['/m/027rn', '/m/06v8s0'], 'e') array([[ 0.04482496 0.11973907 0.01117733 ... -0.13391922 0.11103553 -0.08132861] [-0.10158381 0.08108605 -0.07608676 ... 0.0591407 0.02791426 0.07559016]], dtype=float32)
- get_focusE_params(dict_params={})¶
Get parameters for focusE.
- Parameters:
dict_params (dict) –
The following hyper-params can be passed:
”non_linearity”: can assume of the following values “linear”, “softplus”, “sigmoid”, “tanh”.
”stop_epoch”: specifies how long to decay (linearly) the structural influence hyper-parameter from 1 until it reaches its original value.
”structural_wt”: structural influence hyperparameter [0, 1] that modulates the influence of graph topology.
If the respective key is missing:
non_linearity="linear",stop_epoch=251andstructural_wt=0.001.- Returns:
focusE_params – A tuple containing three values: the non-linearity function (str), the stop_epoch (int) and the structure weight (float).
- Return type:
tuple
- get_indexes(X, type_of='t', order='raw2ind')¶
Converts given data to indexes or to raw data (according to
order).It works for
Xcontaining triples, entities, or relations.- Parameters:
X (np.array or list) – Data to be indexed.
type_of (str) – Specifies whether to get indexes/raw data for triples (
type_of='t'), entities (type_of='e'), or relations (type_of='r').order (str) – Specifies whether to get indexes from raw data (
order='raw2ind') or raw data from indexes (order='ind2raw').
- Returns:
Y – Indexed data or raw data.
- Return type:
np.array
Example
>>> from ampligraph.latent_features import ScoringBasedEmbeddingModel >>> from ampligraph.datasets import load_fb15k_237 >>> X = load_fb15k_237() >>> model = ScoringBasedEmbeddingModel(eta=5, >>> k=300, >>> scoring_type='ComplEx', >>> seed=0) >>> model.compile(optimizer='adam', loss='nll') >>> model.fit(X['train'], >>> batch_size=10000, >>> epochs=5, >>> verbose=False) >>> print(model.get_indexes(['/m/027rn', '/m/06v8s0'], 'e', 'raw2ind')) >>> print(model.get_indexes([3877, 0], 'e', 'ind2raw')) [0, 3877] ['/m/06v8s0', '/m/027rn']
- get_train_embedding_matrix_size()¶
Returns the size of the embedding matrix used for training.
This may not be same as (n, k) during partitioned training (where n is the number of triples in the whole training set).
- is_fit()¶
Check whether the model has been fitted already.
- load_metadata(filepath=None, filedir=None)¶
- load_weights(filepath)¶
Loads the model weights.
Use this function if
save_weightswas used to save the model.Note
If you want to continue training, you can use the
ampligraph.utils.save_model()andampligraph.utils.load_model(). These functions save the entire state of the graph which allows to continue the training from where it stopped.- Parameters:
filepath (str) – Path to save the model.
- make_calibrate_function()¶
Similar to keras lib, this function returns the handle to the calibrate step function.
It processes one batch of data by iterating over the dataset iterator and computes the calibration of predictions.
- Returns:
out – Handle to the calibration function.
- Return type:
Function handle
- make_predict_function()¶
Similar to keras lib, this function returns the handle to the predict step function.
It processes one batch of data by iterating over the dataset iterator and computes the prediction outputs.
- Returns:
out – Handle to the predict function.
- Return type:
Function handle
- make_test_function()¶
Similar to keras lib, this function returns the handle to test step function.
It processes one batch of data by iterating over the dataset iterator and computes the test metrics.
- Returns:
out – Handle to the test step function.
- Return type:
Function handle
- make_train_function()¶
Similar to keras lib, this function returns the handle to the training step function. It processes one batch of data by iterating over the dataset iterator, it computes the loss and optimizes on it.
- Returns:
out – Handle to the training step function.
- Return type:
Function handle
- partition_change_updates(num_ents, ent_emb, rel_emb)¶
Perform the changes that are required when the partition is modified during training.
- Parameters:
num_ents (int) – Number of unique entities in the partition.
ent_emb (array-like) – Entity embeddings that need to be trained for the partition (all triples of the partition will have embeddings in this matrix).
rel_emb (array-like) – relation embeddings that need to be trained for the partition (all triples of the partition will have embeddings in this matrix).
- predict(x, batch_size=32, verbose=0, callbacks=None)¶
Compute scores of the input triples.
- Parameters:
x (np.array, shape (n, 3) or str or GraphDataLoader or AbstractGraphPartitioner) – Data OR Filename of the data file OR Data Handle to be used for training.
batch_size (int) – Batch size to use during training. May be overridden if
xis GraphDataLoader or AbstractGraphPartitioner instanceverbose (bool) – Verbosity mode.
callbacks (list of keras.callbacks.Callback instances) – List of callbacks to apply during evaluation.
- Returns:
scores – Score of the input triples.
- Return type:
np.array, shape (n, )
Example
>>> from ampligraph.latent_features import ScoringBasedEmbeddingModel >>> import numpy as np >>> from ampligraph.datasets import load_fb15k_237 >>> X = load_fb15k_237() >>> model = ScoringBasedEmbeddingModel(eta=5, >>> k=300, >>> scoring_type='ComplEx', >>> seed=0) >>> model.compile(optimizer='adam', loss='nll') >>> model.fit(X['train'], >>> batch_size=10000, >>> epochs=5) Epoch 1/5 29/29 [==============================] - 7s 228ms/step - loss: 67361.2734 Epoch 2/5 29/29 [==============================] - 5s 184ms/step - loss: 67318.8203 Epoch 3/5 29/29 [==============================] - 5s 187ms/step - loss: 67021.1641 Epoch 4/5 29/29 [==============================] - 5s 188ms/step - loss: 65865.5547 Epoch 5/5 29/29 [==============================] - 5s 188ms/step - loss: 63510.2773
>>> pred = model.predict(X['test'], >>> batch_size=100) >>> print(np.sort(pred)) [-1.0868168 -0.46582496 -0.44715863 ... 3.2484274 3.3147712 3.326 ]
- predict_proba(x, batch_size=32, verbose=0, callbacks=None)¶
Compute calibrated scores (\(0 ≤ score ≤ 1\)) for the input triples.
- Parameters:
x (np.array, shape (n,3) or str or GraphDataLoader or AbstractGraphPartitioner) – Data OR Filename of the data file OR Data Handle to be used for training.
batch_size (int) – Batch size to use during training. May be overridden if
xis GraphDataLoader or AbstractGraphPartitioner instance.verbose (bool) – Verbosity mode (default: False).
callbacks (list of keras.callbacks.Callback instances) – List of callbacks to apply during evaluation.
- Returns:
scores – Calibrated scores for the input triples.
- Return type:
np.array, shape (n, )
Example
>>> from ampligraph.datasets import load_fb15k_237 >>> from ampligraph.latent_features import ScoringBasedEmbeddingModel >>> import numpy as np >>> dataset = load_fb15k_237() >>> model = ScoringBasedEmbeddingModel(eta=5, >>> k=300, >>> scoring_type='ComplEx') >>> model.compile(optimizer='adam', loss='nll') >>> model.fit(dataset['train'], >>> batch_size=10000, >>> epochs=5) >>> print('Raw scores (sorted):', np.sort(model.predict(dataset['test']))) >>> print('Indices obtained by sorting (scores):', np.argsort(model.predict(dataset['test']))) Raw scores (sorted): [-1.0384613 -0.46752608 -0.45149875 ... 3.2897844 3.3034315 3.3280635 ] Indices obtained by sorting (scores): [ 3834 18634 4066 ... 1355 13633 10961] >>> model.calibrate(dataset['test'], >>> batch_size=10000, >>> positive_base_rate=0.9, >>> epochs=100) >>> print('Calibrated scores (sorted):', np.sort(model.predict_proba(dataset['test']))) >>> print('Indices obtained by sorting (Calibrated):', np.argsort(model.predict_proba(dataset['test']))) Calibrated scores (sorted): [0.5553725 0.5556108 0.5568415 ... 0.6211011 0.62382233 0.6297585 ] Indices obtained by sorting (Calibrated): [14573 11577 4404 ... 17817 17816 733]
- predict_step(inputs)¶
Returns the output of predict step on a batch of data.
- predict_step_partitioning(inputs)¶
Returns the output of predict step on a batch of data.
- process_model_inputs_for_test(triples)¶
Return the processed triples.
- Parameters:
triples (np.array) – Triples to be processed.
- Returns:
out_triples – In regular (non partitioned) mode, the triples are returned as they are given in input. In case of partitioning, it returns the triple embeddings as a list of size 3, where each element is a np.array of subjects, predicates and objects embeddings.
- Return type:
np.array or list
- save(filepath, overwrite=True, include_optimizer=True, save_format=None, signatures=None, options=None, save_traces=True)¶
Save the model.
- save_metadata(filepath=None, filedir=None)¶
Save metadata.
- save_weights(filepath, overwrite=True)¶
Save the trainable weights.
Use this function if the training process is complete and you want to use the model only for inference. Use
load_weights()to load the model weights back.Note
If you want to be able of continuing the training, you can use the
ampligraph.utils.save_model()andampligraph.utils.restore_model().These functions save and restore the entire state of the graph, which allows to continue the training from where it was stopped.- Parameters:
filepath (str) – Path to save the model.
overwrite (bool) – Flag which indicates whether the model, if present, needs to be overwritten or not (default: True).
- train_step(data)¶
Training step.
- Parameters:
data (array-like, shape (n, m)) – Batch of input triples (true positives) with weights associated if m>3.
- Returns:
out – Dictionary of metrics computed on the outputs (e.g., loss).
- Return type:
dict
- update_focusE_params()¶
Update the structural weight after decay.