load_wn18rr

ampligraph.datasets.load_wn18rr(check_md5hash=False, clean_unseen=True)

Load the WN18RR dataset

The dataset is described in [DMSR18].

The WN18RR dataset is loaded from file if it exists at the AMPLIGRAPH_DATA_HOME location. If AMPLIGRAPH_DATA_HOME is not set the the default ~/ampligraph_datasets is checked.

If the dataset is not found at either location it is downloaded and placed in AMPLIGRAPH_DATA_HOME or ~/ampligraph_datasets.

It is divided in three splits:

  • train
  • valid
  • test
Dataset Train Valid Test Entities Relations
WN18RR 86,835 3,034 3,134 40,943 11

Warning

WN18RR’s validation set contains 198 unseen entities over 210 triples. The test set has 209 unseen entities, distributed over 210 triples.

Parameters:
  • clean_unseen (bool) – If True, filters triples in validation and test sets that include entities not present in the training set.
  • check_md5hash (bool) – If True check the md5hash of the datset files. Defaults to False.
Returns:

splits – The dataset splits: {‘train’: train, ‘valid’: valid, ‘test’: test}. Each split is an ndarray of shape [n, 3].

Return type:

dict

Examples

>>> from ampligraph.datasets import load_wn18rr
>>> X = load_wn18rr()
>>> X["valid"][0]
array(['02174461', '_hypernym', '02176268'], dtype=object)