load_yago3_10

ampligraph.datasets.load_yago3_10(check_md5hash=False, clean_unseen=True)

Load the YAGO3-10 dataset

The dataset is a split of YAGO3 [MBS13], and has been first presented in [DMSR18].

The YAGO3-10 dataset is loaded from file if it exists at the AMPLIGRAPH_DATA_HOME location. If AMPLIGRAPH_DATA_HOME is not set the the default ~/ampligraph_datasets is checked.

If the dataset is not found at either location it is downloaded and placed in AMPLIGRAPH_DATA_HOME or ~/ampligraph_datasets.

It is divided in three splits:

  • train

  • valid

  • test

Dataset

Train

Valid

Test

Entities

Relations

YAGO3-10

1,079,040

5,000

5,000

123,182

37

Parameters
  • check_md5hash (boolean) – If True check the md5hash of the files. Defaults to False.

  • clean_unseen (bool) – If True, filters triples in validation and test sets that include entities not present in the training set.

Returns

splits – The dataset splits: {‘train’: train, ‘valid’: valid, ‘test’: test}. Each split is an ndarray of shape [n, 3].

Return type

dict

Examples

>>> from ampligraph.datasets import load_yago3_10
>>> X = load_yago3_10()
>>> X["valid"][0]
array(['Mikheil_Khutsishvili', 'playsFor', 'FC_Merani_Tbilisi'], dtype=object)