Models¶
This module includes neural graph embedding models and support functions.
Knowledge graph embedding models are neural architectures that encode concepts from a knowledge graph (i.e. entities \(\mathcal{E}\) and relation types \(\mathcal{R}\)) into low-dimensional, continuous vectors \(\in \mathcal{R}^k\). Such knowledge graph embeddings have applications in knowledge graph completion, entity resolution, and link-based clustering, just to cite a few [NMTG16].
Knowledge Graph Embedding Models¶
RandomBaseline ([seed]) |
Random baseline |
TransE ([k, eta, epochs, batches_count, …]) |
Translating Embeddings (TransE) |
DistMult ([k, eta, epochs, batches_count, …]) |
The DistMult model |
ComplEx ([k, eta, epochs, batches_count, …]) |
Complex embeddings (ComplEx) |
HolE ([k, eta, epochs, batches_count, seed, …]) |
Holographic Embeddings |
Anatomy of a Model¶
Knowledge graph embeddings are learned by training a neural architecture over a graph. Although such architectures vary, the training phase always consists in minimizing a loss function \(\mathcal{L}\) that includes a scoring function \(f_{m}(t)\), i.e. a model-specific function that assigns a score to a triple \(t=(sub,pred,obj)\).
AmpliGraph models include the following components:
- Scoring function \(f(t)\)
- Loss function \(\mathcal{L}\)
- Optimization algorithm
- Negatives generation strategy
AmpliGraph comes with a number of such components. They can be used in any combination to come up with a model that performs sufficiently well for the dataset of choice.
AmpliGraph features a number of abstract classes that can be extended to design new models:
EmbeddingModel ([k, eta, epochs, …]) |
Abstract class for embedding models |
Loss (eta, hyperparam_dict[, verbose]) |
Abstract class for loss function. |
Regularizer (hyperparam_dict[, verbose]) |
Abstract class for Regularizer. |
Scoring functions¶
Existing models propose scoring functions that combine the embeddings \(\mathbf{e}_{s},\mathbf{r}_{p}, \mathbf{e}_{o} \in \mathcal{R}^k\) of the subject, predicate, and object of a triple \(t=(s,p,o)\) according to different intuitions:
TransE
[BUGD+13] relies on distances. The scoring function computes a similarity between the embedding of the subject translated by the embedding of the predicate and the embedding of the object, using the \(L_1\) or \(L_2\) norm \(||\cdot||\):
Other models such ConvE include convolutional layers [DMSR18] (will be available in AmpliGraph future releases).
Loss Functions¶
AmpliGraph includes a number of loss functions commonly used in literature. Each function can be used with any of the implemented models. Loss functions are passed to models as hyperparameter, and they can be thus used during model selection.
PairwiseLoss (eta[, loss_params, verbose]) |
Pairwise, max-margin loss. |
NLLLoss (eta[, loss_params, verbose]) |
Negative log-likelihood loss. |
AbsoluteMarginLoss (eta[, loss_params, verbose]) |
Absolute margin , max-margin loss. |
SelfAdversarialLoss (eta[, loss_params, verbose]) |
Self adversarial sampling loss. |
Regularizers¶
AmpliGraph includes a number of regularizers that can be used with the loss function.
LPRegularizer
supports L1, L2, and L3.
LPRegularizer ([regularizer_params, verbose]) |
Performs LP regularization |
Optimizers¶
The goal of the optimization procedure is learning optimal embeddings, such that the scoring function is able to assign high scores to positive statements and low scores to statements unlikely to be true.
We support SGD-based optimizers provided by TensorFlow, by setting the optimizer
argument in a model initializer.
Best results are currently obtained with Adam.
Utils Functions¶
Models can be saved and restored from disk. This is useful to avoid re-training a model.
save_model (model, loc) |
Save a trained model to disk. |
restore_model (loc) |
Restore a saved model from disk. |