load_wn18rr¶
-
ampligraph.datasets.
load_wn18rr
(check_md5hash=False, clean_unseen=True, add_reciprocal_rels=False)¶ Load the WN18RR dataset
The dataset is described in [DMSR18].
The WN18RR dataset is loaded from file if it exists at the
AMPLIGRAPH_DATA_HOME
location. IfAMPLIGRAPH_DATA_HOME
is not set the the default~/ampligraph_datasets
is checked.If the dataset is not found at either location it is downloaded and placed in
AMPLIGRAPH_DATA_HOME
or~/ampligraph_datasets
.It is divided in three splits:
train
valid
test
Dataset
Train
Valid
Test
Entities
Relations
WN18RR
86,835
3,034
3,134
40,943
11
Warning
WN18RR’s validation set contains 198 unseen entities over 210 triples. The test set has 209 unseen entities, distributed over 210 triples.
- Parameters
clean_unseen (bool) – If
True
, filters triples in validation and test sets that include entities not present in the training set.check_md5hash (bool) – If
True
check the md5hash of the datset files. Defaults toFalse
.add_reciprocal_rels (bool) – Flag which specifies whether to add reciprocal relations. For every <s, p, o> in the dataset this creates a corresponding triple with reciprocal relation <o, p_reciprocal, s>. (default: False).
- Returns
splits – The dataset splits: {‘train’: train, ‘valid’: valid, ‘test’: test}. Each split is an ndarray of shape [n, 3].
- Return type
dict
Examples
>>> from ampligraph.datasets import load_wn18rr >>> X = load_wn18rr() >>> X["valid"][0] array(['02174461', '_hypernym', '02176268'], dtype=object)