load_wn18rr¶
- ampligraph.datasets.datasets.load_wn18rr(check_md5hash=False, clean_unseen=True, add_reciprocal_rels=False)¶
Load the WN18RR dataset.
The dataset is described in [DMSR18].
Warning
WN18RR’s validation set contains 198 unseen entities over 210 triples. The test set has 209 unseen entities, distributed over 210 triples.
The WN18RR dataset is loaded from file if it exists at the
AMPLIGRAPH_DATA_HOMElocation. IfAMPLIGRAPH_DATA_HOMEis not set, the default~/ampligraph_datasetsis checked. If the dataset is not found at either location, it is downloaded and placed inAMPLIGRAPH_DATA_HOMEor~/ampligraph_datasets.This dataset is divided in three splits:
train: 86,835 triples
valid: 3,034 triples
test: 3,134 triples
Dataset
Train
Valid
Test
Entities
Relations
WN18RR
86,835
3,034
3,134
40,943
11
- Parameters:
clean_unseen (bool) – If True, filters triples in validation and test sets that include entities not present in the training set.
check_md5hash (bool) – If True check the md5hash of the datset files (default: False).
add_reciprocal_rels (bool) – Flag which specifies whether to add reciprocal relations. For every <s, p, o> in the dataset this creates a corresponding triple with reciprocal relation <o, p_reciprocal, s> (default: False).
- Returns:
splits – The dataset splits: {‘train’: train, ‘valid’: valid, ‘test’: test}. Each split is a ndarray of shape (n, 3).
- Return type:
dict
Example
>>> from ampligraph.datasets import load_wn18rr >>> X = load_wn18rr() >>> X["valid"][0] array(['02174461', '_hypernym', '02176268'], dtype=object)