load_fb15k_237¶
- ampligraph.datasets.datasets.load_fb15k_237(check_md5hash=False, clean_unseen=True, add_reciprocal_rels=False, return_mapper=False)¶
Load the FB15k-237 dataset (with option to load human labeled test subset).
FB15k-237 is a reduced version of FB15K. It was first proposed by [TCP+15].
Warning
FB15K-237’s validation set contains 8 unseen entities over 9 triples. The test set has 29 unseen entities, distributed over 28 triples.
The FB15k-237 dataset is loaded from file if it exists at the
AMPLIGRAPH_DATA_HOMElocation. IfAMPLIGRAPH_DATA_HOMEis not set, the default~/ampligraph_datasetsis checked. If the dataset is not found at either location, it is downloaded and placed inAMPLIGRAPH_DATA_HOMEor~/ampligraph_datasets.The dataset is divided in three splits:
train: 272,115 triples
valid: 17,535 triples
test: 20,466 triples
It also contains a subset of the test set with human-readable labels, available here:
test-human
test-human-ids
Dataset
Train
Valid
Test
Test-Human
Entities
Relations
FB15K-237
272,115
17,535
20,466
273
14,541
237
- Parameters:
check_md5hash (bool) – If True check the md5hash of the files (default: False).
clean_unseen (bool) – If True, filters triples in validation and test sets that include entities not present in the training set.
add_reciprocal_rels (bool) – Flag which specifies whether to add reciprocal relations. For every <s, p, o> in the dataset this creates a corresponding triple with reciprocal relation <o, p_reciprocal, s> (default: False).
return_mapper (bool) – Whether to return human-readable labels in a form of dictionary in
X["mapper"]field (default: False).
- Returns:
splits – The dataset splits: {‘train’: train, ‘valid’: valid, ‘test’: test, ‘test-human’:test_human, ‘test-human-ids’: test_human_ids}. Each split is a ndarray of shape (n, 3).
- Return type:
dict
Example
>>> from ampligraph.datasets import load_fb15k_237 >>> X = load_fb15k_237() >>> X["train"][2] array(['/m/07s9rl0', '/media_common/netflix_genre/titles', '/m/0170z3'], dtype=object)