Datasets

Helper functions to load knowledge graphs from disk.

Note

It is recommended to set the AMPLIGRAPH_DATA_HOME environment variable:

export AMPLIGRAPH_DATA_HOME=/YOUR/PATH/TO/datasets

When attempting to load a dataset, the module will first check if AMPLIGRAPH_DATA_HOME is set. If it is, it will search this location for the required dataset. If the dataset is not found it will be downloaded and placed in this directory.

If AMPLIGRAPH_DATA_HOME has not been set the databases will be saved in the following directory:

~/ampligraph_datasets

Additionally, a specific directory can be passed to the dataset loader via the data_home parameter.

Dataset-Specific Loaders

Use these helpers functions to load datasets used in graph representation learning literature. The functions will automatically download the datasets if they are not present in ~/ampligraph_datasets or at the location set in AMPLIGRAPH_DATA_HOME.

load_wn18([data_home]) Load the WN18 dataset
load_fb15k([data_home]) Load the FB15k dataset
load_fb15k_237([data_home]) Load the FB15k-237 dataset
load_yago3_10([data_home]) Load the YAGO3-10 dataset
load_wn18rr([data_home]) Load the WN18RR dataset

Dataset Summary

Dataset Train Valid Test Entities Relations
FB15K-237 272,115 17,535 20,466 14,541 237
WN18RR 86,835 3,034 3,134 40,943 11
FB15K 483,142 50,000 59,071 14,951 1,345
WN18 141,442 5,000 5,000 40,943 18
YAGO3-10 1,079,040 5,000 5,000 123,182 37

These datasets are originated from: FB15K-237, WN18RR, FB15K, WN18, YAGO3-10

Warning

FB15K-237 contains 8 unseen entities inside 9 triples in the validation set and 29 inside 28 triples in the test set. WN18RR contains 198 unseen entities inside 210 triples in the validation set and 209 inside 210 triples in the test set.

Generic Loaders

Functions to load custom knowledge graphs from disk.

Note

The environment variable AMPLIGRAPH_DATA_HOME must be set and input graphs must be stored at the path indicated.

load_from_csv(directory_path, file_name[, …]) Load a csv file
load_from_ntriples(folder_name, file_name[, …]) Load RDF ntriples as csv statements
load_from_rdf(folder_name, file_name[, …]) Load an RDF file