ConvKB¶

class ampligraph.latent_features.ConvKB(k=100, eta=2, epochs=100, batches_count=100, seed=0, embedding_model_params={'dropout': 0.1, 'filter_sizes': [1], 'num_filters': 32}, optimizer='adam', optimizer_params={'lr': 0.0005}, loss='nll', loss_params={}, regularizer=None, regularizer_params={}, initializer='xavier', initializer_params={'uniform': False}, large_graphs=False, verbose=False)¶

Convolution-based model

The ConvKB model [NNNP18]:

\[f_{ConvKB}= concat \,(g \, ([\mathbf{e}_s, \mathbf{r}_p, \mathbf{e}_o]) * \Omega)) \cdot W\]

where \(g\) is a non-linear function, \(*\) is the convolution operator, \(\cdot\) is the dot product, \(concat\) is the concatenation operator and \(\Omega\) is a set of filters.

Note

The evaluation protocol implemented in ampligraph.evaluation.evaluate_performance() assigns the worst rank to a positive test triple in case of a tie with negatives. This is the agreed upon behaviour in literature. The original ConvKB implementation [NNNP18] assigns instead the top rank, hence leading to results which are not directly comparable with literature . We report results obtained with the agreed-upon protocol (tie=worst rank). Note that under these conditions the model does not reach the state-of-the-art results claimed in the original paper.

Examples

>>> from ampligraph.latent_features import ConvKB
>>> from ampligraph.datasets import load_wn18
>>> model = ConvKB(batches_count=2, seed=22, epochs=1, k=10, eta=1,
>>>               embedding_model_params={'num_filters': 32, 'filter_sizes': [1],
>>>                                       'dropout': 0.1},
>>>               optimizer='adam', optimizer_params={'lr': 0.001},
>>>               loss='pairwise', loss_params={}, verbose=True)
>>>
>>> X = load_wn18()
>>>
>>> model.fit(X['train'])
>>>
>>> print(model.predict(X['test'][:5]))
[[0.2803744], [0.0866661], [0.012815937], [-0.004235901], [-0.010947697]]

Methods

`__init__`([k, eta, epochs, batches_count, …])	Initialize an EmbeddingModel
`fit`(X[, early_stopping, …])	Train a ConvKB model (with optional early stopping).
`get_embeddings`(entities[, embedding_type])	Get the embeddings of entities or relations.
`get_hyperparameter_dict`()	Returns hyperparameters of the model.
`predict`(X[, from_idx])	Predict the scores of triples using a trained embedding model.
`calibrate`(X_pos[, X_neg, …])	Calibrate predictions
`predict_proba`(X)	Predicts probabilities using the Platt scaling model (after calibration).

__init__(k=100, eta=2, epochs=100, batches_count=100, seed=0, embedding_model_params={'dropout': 0.1, 'filter_sizes': [1], 'num_filters': 32}, optimizer='adam', optimizer_params={'lr': 0.0005}, loss='nll', loss_params={}, regularizer=None, regularizer_params={}, initializer='xavier', initializer_params={'uniform': False}, large_graphs=False, verbose=False)¶

Initialize an EmbeddingModel

Parameters:

k (int) – Embedding space dimensionality.
eta (int) – The number of negatives that must be generated at runtime during training for each positive.
epochs (int) – The iterations of the training loop.
batches_count (int) – The number of batches in which the training set must be split during the training loop.
seed (int) – The seed used by the internal random numbers generator.
embedding_model_params (dict) –
ConvKB-specific hyperparams: - num_filters - Number of feature maps per convolution kernel. Default: 32 - filter_sizes - List of convolution kernel sizes. Default: [1] - dropout - Dropout on the embedding layer. Default: 0.0 - ‘non_linearity’: can be one of the following values linear, softplus, sigmoid, tanh - ‘stop_epoch’: specifies how long to decay (linearly) the numeric values from 1 to original value until it reachs original value. - ‘structural_wt’: structural influence hyperparameter [0, 1] that modulates the influence of graph topology. - ‘normalize_numeric_values’: normalize the numeric values, such that they are scaled between [0, 1]

The last 4 parameters are related to FocusE layers.
optimizer (string) – The optimizer used to minimize the loss function. Choose between ‘sgd’, ‘adagrad’, ‘adam’, ‘momentum’.
optimizer_params (dict) –
Arguments specific to the optimizer, passed as a dictionary.

Supported keys:
- ’lr’ (float): learning rate (used by all the optimizers). Default: 0.1.
- ’momentum’ (float): learning momentum (only used when optimizer=momentum). Default: 0.9.
Example: optimizer_params={'lr': 0.01}
loss (string) – The type of loss function to use during training.
loss_params (dict) –
Dictionary of loss-specific hyperparameters. See loss functions documentation for additional details.

Supported keys:
- ’lr’ (float): learning rate (used by all the optimizers). Default: 0.1.
- ’momentum’ (float): learning momentum (only used when optimizer=momentum). Default: 0.9.
Example: optimizer_params={'lr': 0.01, 'label_smoothing': 0.1}
regularizer (string) –
The regularization strategy to use with the loss function.
- None: the model will not use any regularizer (default)
- LP: the model will use L1, L2 or L3 based on the value of regularizer_params['p'] (see below).
regularizer_params (dict) –
Dictionary of regularizer-specific hyperparameters. See the regularizers documentation for additional details.

Example: regularizer_params={'lambda': 1e-5, 'p': 2} if regularizer='LP'.
initializer (string) –
The type of initializer to use.
- normal: The embeddings will be initialized from a normal distribution
- uniform: The embeddings will be initialized from a uniform distribution
- xavier: The embeddings will be initialized using xavier strategy (default)
initializer_params (dict) –
Dictionary of initializer-specific hyperparameters. See the initializer documentation for additional details.

Example: initializer_params={'mean': 0, 'std': 0.001} if initializer='normal'.
large_graphs (bool) – Avoid loading entire dataset onto GPU when dealing with large graphs.
verbose (bool) – Verbose mode.

fit(X, early_stopping=False, early_stopping_params={}, focusE_numeric_edge_values=None, tensorboard_logs_path=None)¶

Train a ConvKB model (with optional early stopping).

The model is trained on a training set X using the training protocol described in [TWR+16].

Parameters:

X (ndarray, shape [n, 3]) – The training triples
early_stopping (bool) –
Flag to enable early stopping (default:False).

If set to True, the training loop adopts the following early stopping heuristic:
- The model will be trained regardless of early stopping for burn_in epochs.
- Every check_interval epochs the method will compute the metric specified in criteria.
If such metric decreases for stop_interval checks, we stop training early.

Note the metric is computed on x_valid. This is usually a validation set that you held out.

Also, because criteria is a ranking metric, it requires generating negatives. Entities used to generate corruptions can be specified, as long as the side(s) of a triple to corrupt. The method supports filtered metrics, by passing an array of positives to x_filter. This will be used to filter the negatives generated on the fly (i.e. the corruptions).
Note

Keep in mind the early stopping criteria may introduce a certain overhead (caused by the metric computation). The goal is to strike a good trade-off between such overhead and saving training epochs.

A common approach is to use MRR unfiltered:
```
early_stopping_params={x_valid=X['valid'], 'criteria':
'mrr'}
```
Note the size of validation set also contributes to such overhead. In most cases a smaller validation set would be enough.
early_stopping_params (dictionary) –
Dictionary of hyperparameters for the early stopping heuristics.

The following string keys are supported:
- ’x_valid’: ndarray, shape [n, 3] : Validation set to be used for early stopping.
- ’criteria’: string : criteria for early stopping ‘hits10’, ‘hits3’, ‘hits1’ or ‘mrr’(default).
- ’x_filter’: ndarray, shape [n, 3] : Positive triples to use as filter if a ‘filtered’ early stopping criteria is desired (i.e. filtered-MRR if ‘criteria’:’mrr’). Note this will affect training time (no filter by default).
- ’burn_in’: int : Number of epochs to pass before kicking in early stopping (default: 100).
- check_interval’: int : Early stopping interval after burn-in (default:10).
- ’stop_interval’: int : Stop if criteria is performing worse over n consecutive checks (default: 3)
- ’corruption_entities’: List of entities to be used for corruptions. If ‘all’, it uses all entities (default: ‘all’)
- ’corrupt_side’: Specifies which side to corrupt. ‘s’, ‘o’, ‘s+o’ (default)
Example: early_stopping_params={x_valid=X['valid'], 'criteria': 'mrr'}
focusE_numeric_edge_values (nd array (n, 1)) – Numeric values associated with links. Semantically, the numeric value can signify importance, uncertainity, significance, confidence, etc. If the numeric value is unknown pass a NaN weight. The model will uniformly randomly assign a numeric value. One can also think about assigning numeric values by looking at the distribution of it per predicate.
tensorboard_logs_path (str or None) – Path to store tensorboard logs, e.g. average training loss tracking per epoch (default: None indicating no logs will be collected). When provided it will create a folder under provided path and save tensorboard files there. To then view the loss in the terminal run: tensorboard --logdir <tensorboard_logs_path>.

get_embeddings(entities, embedding_type='entity')¶

Get the embeddings of entities or relations.

Note

Use ampligraph.utils.create_tensorboard_visualizations() to visualize the embeddings with TensorBoard.

Parameters:	entities (array-like, dtype=int, shape=[n]) – The entities (or relations) of interest. Element of the vector must be the original string literals, and not internal IDs. embedding_type (string) – If ‘entity’, `entities` argument will be considered as a list of knowledge graph entities (i.e. nodes). If set to ‘relation’, they will be treated as relation types instead (i.e. predicates).
Returns:	embeddings – An array of k-dimensional embeddings.
Return type:	ndarray, shape [n, k]

get_hyperparameter_dict()¶

Returns hyperparameters of the model.

Returns:	hyperparam_dict – Dictionary of hyperparameters that were used for training.
Return type:	dict

predict(X, from_idx=False)¶

Predict the scores of triples using a trained embedding model. The function returns raw scores generated by the model.

Note

To obtain probability estimates, calibrate the model with calibrate(), then call predict_proba().

Parameters:	X (ndarray, shape [n, 3]) – The triples to score. from_idx (bool) – If True, will skip conversion to internal IDs. (default: False).
Returns:	scores_predict – The predicted scores for input triples X.
Return type:	ndarray, shape [n]

calibrate(X_pos, X_neg=None, positive_base_rate=None, batches_count=100, epochs=50)¶

Calibrate predictions

The method implements the heuristics described in [TC20], using Platt scaling [P+99].

The calibrated predictions can be obtained with predict_proba() after calibration is done.

Ideally, calibration should be performed on a validation set that was not used to train the embeddings.

There are two modes of operation, depending on the availability of negative triples:

Both positive and negative triples are provided via X_pos and X_neg respectively. The optimization is done using a second-order method (limited-memory BFGS), therefore no hyperparameter needs to be specified.
Only positive triples are provided, and the negative triples are generated by corruptions just like it is done in training or evaluation. The optimization is done using a first-order method (ADAM), therefore batches_count and epochs must be specified.

Calibration is highly dependent on the base rate of positive triples. Therefore, for mode (2) of operation, the user is required to provide the positive_base_rate argument. For mode (1), that can be inferred automatically by the relative sizes of the positive and negative sets, but the user can override that by providing a value to positive_base_rate.

Defining the positive base rate is the biggest challenge when calibrating without negatives. That depends on the user choice of which triples will be evaluated during test time. Let’s take WN11 as an example: it has around 50% positives triples on both the validation set and test set, so naturally the positive base rate is 50%. However, should the user resample it to have 75% positives and 25% negatives, its previous calibration will be degraded. The user must recalibrate the model now with a 75% positive base rate. Therefore, this parameter depends on how the user handles the dataset and cannot be determined automatically or a priori.

Note

Incompatible with large graph mode (i.e. if self.dealing_with_large_graphs=True).

Note

Experiments for the ICLR-21 calibration paper are available here [TC20].

Parameters:

X_pos (ndarray (shape [n, 3])) – Numpy array of positive triples.
X_neg (ndarray (shape [n, 3])) –
Numpy array of negative triples.

If None, the negative triples are generated via corruptions and the user must provide a positive base rate instead.
positive_base_rate (float) –
Base rate of positive statements.

For example, if we assume there is a fifty-fifty chance of any query to be true, the base rate would be 50%.

If X_neg is provided and this is None, the relative sizes of X_pos and X_neg will be used to determine the base rate. For example, if we have 50 positive triples and 200 negative triples, the positive base rate will be assumed to be 50/(50+200) = 1/5 = 0.2.

This must be a value between 0 and 1.
batches_count (int) – Number of batches to complete one epoch of the Platt scaling training. Only applies when X_neg is None.
epochs (int) – Number of epochs used to train the Platt scaling model. Only applies when X_neg is None.

Examples

>>> import numpy as np
>>> from sklearn.metrics import brier_score_loss, log_loss
>>> from scipy.special import expit
>>>
>>> from ampligraph.datasets import load_wn11
>>> from ampligraph.latent_features.models import TransE
>>>
>>> X = load_wn11()
>>> X_valid_pos = X['valid'][X['valid_labels']]
>>> X_valid_neg = X['valid'][~X['valid_labels']]
>>>
>>> model = TransE(batches_count=64, seed=0, epochs=500, k=100, eta=20,
>>>                optimizer='adam', optimizer_params={'lr':0.0001},
>>>                loss='pairwise', verbose=True)
>>>
>>> model.fit(X['train'])
>>>
>>> # Raw scores
>>> scores = model.predict(X['test'])
>>>
>>> # Calibrate with positives and negatives
>>> model.calibrate(X_valid_pos, X_valid_neg, positive_base_rate=None)
>>> probas_pos_neg = model.predict_proba(X['test'])
>>>
>>> # Calibrate with just positives and base rate of 50%
>>> model.calibrate(X_valid_pos, positive_base_rate=0.5)
>>> probas_pos = model.predict_proba(X['test'])
>>>
>>> # Calibration evaluation with the Brier score loss (the smaller, the better)
>>> print("Brier scores")
>>> print("Raw scores:", brier_score_loss(X['test_labels'], expit(scores)))
>>> print("Positive and negative calibration:", brier_score_loss(X['test_labels'], probas_pos_neg))
>>> print("Positive only calibration:", brier_score_loss(X['test_labels'], probas_pos))
Brier scores
Raw scores: 0.4925058891371126
Positive and negative calibration: 0.20434617882733366
Positive only calibration: 0.22597599585144656

predict_proba(X)¶

Predicts probabilities using the Platt scaling model (after calibration).

Model must be calibrated beforehand with the calibrate method.

Parameters:	X (ndarray (shape [n, 3])) – Numpy array of triples to be evaluated.
Returns:	probas – Probability of each triple to be true according to the Platt scaling calibration.
Return type:	ndarray (shape [n])