Datasets

Helper functions to load knowledge graphs.

Note

It is recommended to set the AMPLIGRAPH_DATA_HOME environment variable:

export AMPLIGRAPH_DATA_HOME=/YOUR/PATH/TO/datasets

When attempting to load a dataset, the module will first check if AMPLIGRAPH_DATA_HOME is set. If it is, it will search this location for the required dataset. If the dataset is not found it will be downloaded and placed in this directory.

If AMPLIGRAPH_DATA_HOME has not been set the databases will be saved in the following directory:

~/ampligraph_datasets

Benchmark Datasets Loaders

Use these helpers functions to load datasets used in graph representation learning literature. The functions will automatically download the datasets if they are not present in ~/ampligraph_datasets or at the location set in AMPLIGRAPH_DATA_HOME.

load_fb15k_237([check_md5hash, clean_unseen])

Load the FB15k-237 dataset

load_wn18rr([check_md5hash, clean_unseen])

Load the WN18RR dataset

load_yago3_10([check_md5hash, clean_unseen])

Load the YAGO3-10 dataset

load_fb15k([check_md5hash])

Load the FB15k dataset

load_wn18([check_md5hash])

Load the WN18 dataset

load_wn11([check_md5hash, clean_unseen])

Load the WordNet11 (WN11) dataset

load_fb13([check_md5hash, clean_unseen])

Load the Freebase13 (FB13) dataset

Datasets Summary

Dataset

Train

Valid

Test

Entities

Relations

FB15K-237

272,115

17,535

20,466

14,541

237

WN18RR

86,835

3,034

3,134

40,943

11

FB15K

483,142

50,000

59,071

14,951

1,345

WN18

141,442

5,000

5,000

40,943

18

YAGO3-10

1,079,040

5,000

5,000

123,182

37

WN11

110,361

5,215

21,035

38,194

11

FB13

316,232

11,816

47,464

75,043

13

Warning

WN18 and FB15k include a large number of inverse relations, and its use in experiments has been deprecated. Use WN18RR and FB15K-237 instead.

Warning

FB15K-237’s validation set contains 8 unseen entities over 9 triples. The test set has 29 unseen entities, distributed over 28 triples. WN18RR’s validation set contains 198 unseen entities over 210 triples. The test set has 209 unseen entities, distributed over 210 triples.

Note

WN11 and FB13 also provide true and negative labels for the triples in the validation and tests sets. In both cases the positive base rate is close to 50%.

Loaders for Custom Knowledge Graphs

Functions to load custom knowledge graphs from disk.

load_from_csv(directory_path, file_name[, …])

Load a knowledge graph from a csv file

load_from_ntriples(folder_name, file_name[, …])

Load RDF ntriples

load_from_rdf(folder_name, file_name[, …])

Load an RDF file

Hint

AmpliGraph includes a helper function to split a generic knowledge graphs into training, validation, and test sets. See ampligraph.evaluation.train_test_split_no_unseen().