load_yago3_10¶
- ampligraph.datasets.load_yago3_10(check_md5hash=False, clean_unseen=True, add_reciprocal_rels=False)¶
Load the YAGO3-10 dataset.
The dataset is a split of YAGO3 [MBS13], and has been first presented in [DMSR18].
The YAGO3-10 dataset is loaded from file if it exists at the
AMPLIGRAPH_DATA_HOMElocation. IfAMPLIGRAPH_DATA_HOMEis not set, the default~/ampligraph_datasetsis checked. If the dataset is not found at either location it is downloaded and placed inAMPLIGRAPH_DATA_HOMEor~/ampligraph_datasets.This dataset is divided in three splits:
train: 1,079,040 triples
valid: 5,000 triples
test: 5,000 triples
Dataset
Train
Valid
Test
Entities
Relations
YAGO3-10
1,079,040
5,000
5,000
123,182
37
- Parameters:
check_md5hash (bool) – If True check the md5hash of the files (default: False).
clean_unseen (bool) – If True, filters triples in validation and test sets that include entities not present in the training set.
add_reciprocal_rels (bool) – Flag which specifies whether to add reciprocal relations. For every <s, p, o> in the dataset this creates a corresponding triple with reciprocal relation <o, p_reciprocal, s> (default:False).
- Returns:
splits – The dataset splits: {‘train’: train, ‘valid’: valid, ‘test’: test}. Each split is a ndarray of shape (n, 3).
- Return type:
dict
Example
>>> from ampligraph.datasets import load_yago3_10 >>> X = load_yago3_10() >>> X["valid"][0] array(['Mikheil_Khutsishvili', 'playsFor', 'FC_Merani_Tbilisi'], dtype=object)