load_fb15k_237¶
-
ampligraph.datasets.load_fb15k_237(check_md5hash=False, clean_unseen=True)¶ Load the FB15k-237 dataset
FB15k-237 is a reduced version of FB15K. It was first proposed by [TCP+15].
The FB15k-237 dataset is loaded from file if it exists at the
AMPLIGRAPH_DATA_HOMElocation. IfAMPLIGRAPH_DATA_HOMEis not set the the default~/ampligraph_datasetsis checked.If the dataset is not found at either location it is downloaded and placed in
AMPLIGRAPH_DATA_HOMEor~/ampligraph_datasets.The dataset is divided in three splits:
trainvalidtest
Dataset Train Valid Test Entities Relations FB15K-237 272,115 17,535 20,466 14,541 237 Warning
FB15K-237’s validation set contains 8 unseen entities over 9 triples. The test set has 29 unseen entities, distributed over 28 triples.
Parameters: - check_md5hash (boolean) – If
Truecheck the md5hash of the files. Defaults toFalse. - clean_unseen (bool) – If
True, filters triples in validation and test sets that include entities not present in the training set.
Returns: splits – The dataset splits: {‘train’: train, ‘valid’: valid, ‘test’: test}. Each split is an ndarray of shape [n, 3].
Return type: dict
Examples
>>> from ampligraph.datasets import load_fb15k_237 >>> X = load_fb15k_237() >>> X["train"][2] array(['/m/07s9rl0', '/media_common/netflix_genre/titles', '/m/0170z3'], dtype=object)