load_fb15k_237¶

ampligraph.datasets.load_fb15k_237(check_md5hash=False, clean_unseen=True)¶

Load the FB15k-237 dataset

FB15k-237 is a reduced version of FB15K. It was first proposed by [TCP+15].

The FB15k-237 dataset is loaded from file if it exists at the AMPLIGRAPH_DATA_HOME location. If AMPLIGRAPH_DATA_HOME is not set the the default ~/ampligraph_datasets is checked.

If the dataset is not found at either location it is downloaded and placed in AMPLIGRAPH_DATA_HOME or ~/ampligraph_datasets.

The dataset is divided in three splits:

train
valid
test

Dataset	Train	Valid	Test	Entities	Relations
FB15K-237	272,115	17,535	20,466	14,541	237

Warning

FB15K-237’s validation set contains 8 unseen entities over 9 triples. The test set has 29 unseen entities, distributed over 28 triples.

Parameters:	check_md5hash (boolean) – If `True` check the md5hash of the files. Defaults to `False`. clean_unseen (bool) – If `True`, filters triples in validation and test sets that include entities not present in the training set.
Returns:	splits – The dataset splits: {‘train’: train, ‘valid’: valid, ‘test’: test}. Each split is an ndarray of shape [n, 3].
Return type:	dict

Examples

>>> from ampligraph.datasets import load_fb15k_237
>>> X = load_fb15k_237()
>>> X["train"][2]
array(['/m/07s9rl0', '/media_common/netflix_genre/titles', '/m/0170z3'],
  dtype=object)