HolE¶
-
class
ampligraph.latent_features.
HolE
(k=100, eta=2, epochs=100, batches_count=100, seed=0, embedding_model_params={'negative_corruption_entities': 'all'}, optimizer='adam', optimizer_params={'lr': 0.0005}, loss='nll', loss_params={}, regularizer=None, regularizer_params={}, verbose=False)¶ Holographic Embeddings
The HolE model [NRP+16] as re-defined by Hayashi et al. [HS17]:
\[f_{HolE}= \frac{2}{n} \, f_{ComplEx}\]Examples
>>> import numpy as np >>> from ampligraph.latent_features import HolE >>> model = HolE(batches_count=1, seed=555, epochs=20, k=10, >>> loss='pairwise', loss_params={'margin':1}, >>> regularizer='LP', regularizer_params={'lambda':0.1}) >>> >>> X = np.array([['a', 'y', 'b'], >>> ['b', 'y', 'a'], >>> ['a', 'y', 'c'], >>> ['c', 'y', 'a'], >>> ['a', 'y', 'd'], >>> ['c', 'y', 'd'], >>> ['b', 'y', 'c'], >>> ['f', 'y', 'e']]) >>> model.fit(X) >>> model.predict(np.array([['f', 'y', 'e'], ['b', 'y', 'd']]), get_ranks=True) [0.3046168, -0.0379385] >>> model.get_embeddings(['f','e'], type='entity') array([[-0.2704807 , -0.05434025, 0.13363852, 0.04879733, 0.00184516, -0.1149573 , -0.1177371 , -0.20798951, 0.01935115, 0.13033926, -0.81528974, 0.22864424, 0.2045117 , 0.1145515 , 0.248952 , 0.03513691, -0.08550065, -0.06037813, 0.23231442, -0.39326245], [ 0.204738 , 0.10758886, -0.11931524, 0.14881928, 0.0929039 , 0.25577265, 0.05722341, 0.2549932 , -0.16462566, 0.43789816, -0.91011846, 0.3533137 , 0.1144442 , 0.00359709, -0.09599967, -0.03151475, 0.14198618, 0.16138661, 0.07511608, -0.2465882 ]], dtype=float32)
Methods
__init__
([k, eta, epochs, batches_count, …])Initialize an EmbeddingModel fit
(X[, early_stopping, early_stopping_params])Train a HolE model. get_embeddings
(entities[, type])Get the embeddings of entities or relations. predict
(X[, from_idx, get_ranks])Predict the scores of triples using a trained embedding model. -
__init__
(k=100, eta=2, epochs=100, batches_count=100, seed=0, embedding_model_params={'negative_corruption_entities': 'all'}, optimizer='adam', optimizer_params={'lr': 0.0005}, loss='nll', loss_params={}, regularizer=None, regularizer_params={}, verbose=False)¶ Initialize an EmbeddingModel
Also creates a new Tensorflow session for training.Parameters: - k (int) – Embedding space dimensionality
- eta (int) – The number of negatives that must be generated at runtime during training for each positive.
- epochs (int) – The iterations of the training loop.
- batches_count (int) – The number of batches in which the training set must be split during the training loop.
- seed (int) – The seed used by the internal random numbers generator.
- embedding_model_params (dict) –
HolE-specific hyperparams:
- negative_corruption_entities - Entities to be used for generation of corruptions while training. It can take the following values :
all
(default: all entities),batch
(entities present in each batch), list of entities or an int (which indicates how many entities that should be used for corruption generation).
- negative_corruption_entities - Entities to be used for generation of corruptions while training. It can take the following values :
- optimizer (string) – The optimizer used to minimize the loss function. Choose between ‘sgd’, ‘adagrad’, ‘adam’, ‘momentum’.
- optimizer_params (dict) –
Arguments specific to the optimizer, passed as a dictionary.
Supported keys:
- ’lr’ (float): learning rate (used by all the optimizers). Default: 0.1.
- ’momentum’ (float): learning momentum (only used when
optimizer=momentum
). Default: 0.9.
Example:
optimizer_params={'lr': 0.01}
- loss (string) –
The type of loss function to use during training.
- ’pairwise’ the model will use pairwise margin-based loss function.
- ’nll’ the model will use negative loss likelihood.
- ’absolute_margin’ the model will use absolute margin likelihood.
- ’self_adversarial’ the model will use adversarial sampling loss function.
- loss_params (dict) –
Dictionary of loss-specific hyperparameters. See loss functions documentation for additional details.
Example:
optimizer_params={'lr': 0.01}
ifloss='pairwise'
. - regularizer (string) –
The regularization strategy to use with the loss function.
None
: the model will not use any regularizer (default)- ’LP’: the model will use L1, L2 or L3 based on the value of
regularizer_params['p']
(see below).
- regularizer_params (dict) –
Dictionary of regularizer-specific hyperparameters. See the regularizers documentation for additional details.
Example:
regularizer_params={'lambda': 1e-5, 'p': 2}
ifregularizer='LP'
. - verbose (bool) – Verbose mode
-
fit
(X, early_stopping=False, early_stopping_params={})¶ Train a HolE model.
The model is trained on a training set X using the training protocol described in [NRP+16].Parameters: - X (ndarray, shape [n, 3]) – The training triples
- early_stopping (bool) – Flag to enable early stopping (default:
False
) - early_stopping_params (dictionary) –
Dictionary of hyperparameters for the early stopping heuristics.
The following string keys are supported:
- ’x_valid’: ndarray, shape [n, 3] : Validation set to be used for early stopping.
- ’criteria’: string : criteria for early stopping ‘hits10’, ‘hits3’, ‘hits1’ or ‘mrr’(default).
- ’x_filter’: ndarray, shape [n, 3] : Positive triples to use as filter if a ‘filtered’ early stopping criteria is desired (i.e. filtered-MRR if ‘criteria’:’mrr’). Note this will affect training time (no filter by default).
- ’burn_in’: int : Number of epochs to pass before kicking in early stopping (default: 100).
- check_interval’: int : Early stopping interval after burn-in (default:10).
- ’stop_interval’: int : Stop if criteria is performing worse over n consecutive checks (default: 3)
- ’corruption_entities’: List of entities to be used for corruptions. If ‘all’, it uses all entities (default: ‘all’)
- ’corrupt_side’: Specifies which side to corrupt. ‘s’, ‘o’, ‘s+o’ (default)
Example:
early_stopping_params={x_valid=X['valid'], 'criteria': 'mrr'}
-
get_embeddings
(entities, type='entity')¶ Get the embeddings of entities or relations.
Parameters: - entities (array-like, dtype=int, shape=[n]) – The entities (or relations) of interest. Element of the vector must be the original string literals, and not internal IDs.
- type (string) – If ‘entity’, the
entities
argument will be considered as a list of knowledge graph entities (i.e. nodes). If set to ‘relation’, they will be treated as relation types instead (i.e. predicates).
Returns: embeddings – An array of k-dimensional embeddings.
Return type: ndarray, shape [n, k]
-
predict
(X, from_idx=False, get_ranks=False)¶ Predict the scores of triples using a trained embedding model.
The function returns raw scores generated by the model.
Note
To obtain probability estimates, use a logistic sigmoid:
>>> model.fit(X) >>> y_pred = model.predict(np.array([['f', 'y', 'e'], ['b', 'y', 'd']])) >>> print(y_pred) array([1.2052395, 1.5818497], dtype=float32) >>> from scipy.special import expit >>> expit(y_pred) array([0.7694556 , 0.82946634], dtype=float32)
Parameters: - X (ndarray, shape [n, 3]) – The triples to score.
- from_idx (bool) – If True, will skip conversion to internal IDs. (default: False).
- get_ranks (bool) – Flag to compute ranks by scoring against corruptions (default: False).
Returns: - scores_predict (ndarray, shape [n]) – The predicted scores for input triples X.
- rank (ndarray, shape [n]) – Ranks of the triples (only returned if
get_ranks=True
.
-