# RandomBaseline¶

class ampligraph.latent_features.RandomBaseline(seed=0, verbose=False)

Random baseline

A dummy model that assigns a pseudo-random score included between 0 and 1, drawn from a uniform distribution.

The model is useful whenever you need to compare the performance of another model on a custom knowledge graph, and no other baseline is available.

Note

Although the model still requires invoking the fit() method, no actual training will be carried out.

Examples

>>> import numpy as np
>>> from ampligraph.latent_features import RandomBaseline
>>> model = RandomBaseline()
>>> X = np.array([['a', 'y', 'b'],
>>>               ['b', 'y', 'a'],
>>>               ['a', 'y', 'c'],
>>>               ['c', 'y', 'a'],
>>>               ['a', 'y', 'd'],
>>>               ['c', 'y', 'd'],
>>>               ['b', 'y', 'c'],
>>>               ['f', 'y', 'e']])
>>> model.fit(X)
>>> model.predict(np.array([['f', 'y', 'e'], ['b', 'y', 'd']]))
[0.5488135039273248, 0.7151893663724195]


Methods

 __init__([seed, verbose]) Initialize the model fit(X[, early_stopping, early_stopping_params]) Train the random model. predict(X[, from_idx]) Predict the scores of triples using a trained embedding model. Returns hyperparameters of the model.
__init__(seed=0, verbose=False)

Initialize the model

Parameters
• seed (int) – The seed used by the internal random numbers generator.

• verbose (bool) – Verbose mode.

fit(X, early_stopping=False, early_stopping_params={})

Train the random model.

There is no actual training involved in practice and the early stopping parameters won’t have any effect.

Parameters
• X (ndarray, shape [n, 3]) – The training triples

• early_stopping (bool) –

Flag to enable early stopping (default:False).

If set to True, the training loop adopts the following early stopping heuristic:

• The model will be trained regardless of early stopping for burn_in epochs.

• Every check_interval epochs the method will compute the metric specified in criteria.

If such metric decreases for stop_interval checks, we stop training early.

Note the metric is computed on x_valid. This is usually a validation set that you held out.

Also, because criteria is a ranking metric, it requires generating negatives. Entities used to generate corruptions can be specified, as long as the side(s) of a triple to corrupt. The method supports filtered metrics, by passing an array of positives to x_filter. This will be used to filter the negatives generated on the fly (i.e. the corruptions).

Note

Keep in mind the early stopping criteria may introduce a certain overhead (caused by the metric computation). The goal is to strike a good trade-off between such overhead and saving training epochs.

A common approach is to use MRR unfiltered:

early_stopping_params={x_valid=X['valid'], 'criteria': 'mrr'}


Note the size of validation set also contributes to such overhead. In most cases a smaller validation set would be enough.

• early_stopping_params (dictionary) –

Dictionary of hyperparameters for the early stopping heuristics.

The following string keys are supported:

• ’x_valid’: ndarray, shape [n, 3] : Validation set to be used for early stopping.

• ’criteria’: string : criteria for early stopping ‘hits10’, ‘hits3’, ‘hits1’ or ‘mrr’(default).

• ’x_filter’: ndarray, shape [n, 3] : Positive triples to use as filter if a ‘filtered’ early stopping criteria is desired (i.e. filtered-MRR if ‘criteria’:’mrr’). Note this will affect training time (no filter by default).

• ’burn_in’: int : Number of epochs to pass before kicking in early stopping (default: 100).

• check_interval’: int : Early stopping interval after burn-in (default:10).

• ’stop_interval’: int : Stop if criteria is performing worse over n consecutive checks (default: 3)

• ’corruption_entities’: List of entities to be used for corruptions. If ‘all’, it uses all entities (default: ‘all’)

• ’corrupt_side’: Specifies which side to corrupt. ‘s’, ‘o’, ‘s+o’ (default)

Example: early_stopping_params={x_valid=X['valid'], 'criteria': 'mrr'}

predict(X, from_idx=False)

Predict the scores of triples using a trained embedding model. The function returns raw scores generated by the model.

Note

To obtain probability estimates, calibrate the model with calibrate(), then call predict_proba().

Parameters
• X (ndarray, shape [n, 3]) – The triples to score.

• from_idx (bool) – If True, will skip conversion to internal IDs. (default: False).

Returns

scores_predict – The predicted scores for input triples X.

Return type

ndarray, shape [n]

get_hyperparameter_dict()

Returns hyperparameters of the model.

Returns

hyperparam_dict – Dictionary of hyperparameters that were used for training.

Return type

dict