Datasets¶
Helper functions to load knowledge graphs.
Note
It is recommended to set the AMPLIGRAPH_DATA_HOME
environment variable:
export AMPLIGRAPH_DATA_HOME=/YOUR/PATH/TO/datasets
When attempting to load a dataset, the module will first check if AMPLIGRAPH_DATA_HOME
is set.
If it is, it will search this location for the required dataset.
If the dataset is not found it will be downloaded and placed in this directory.
If AMPLIGRAPH_DATA_HOME
has not been set the databases will be saved in the following directory:
~/ampligraph_datasets
Benchmark Datasets Loaders¶
Use these helper functions to load datasets used in graph representation learning literature.
The functions will automatically download the datasets if they are not present in ~/ampligraph_datasets
or
at the location set in AMPLIGRAPH_DATA_HOME
.
load_fb15k_237 ([check_md5hash, …]) |
Load the FB15k-237 dataset |
load_wn18rr ([check_md5hash, clean_unseen, …]) |
Load the WN18RR dataset |
load_yago3_10 ([check_md5hash, clean_unseen, …]) |
Load the YAGO3-10 dataset |
load_fb15k ([check_md5hash, add_reciprocal_rels]) |
Load the FB15k dataset |
load_wn18 ([check_md5hash, add_reciprocal_rels]) |
Load the WN18 dataset |
load_wn11 ([check_md5hash, clean_unseen, …]) |
Load the WordNet11 (WN11) dataset |
load_fb13 ([check_md5hash, clean_unseen, …]) |
Load the Freebase13 (FB13) dataset |
Datasets Summary
Dataset | Train | Valid | Test | Entities | Relations |
---|---|---|---|---|---|
FB15K-237 | 272,115 | 17,535 | 20,466 | 14,541 | 237 |
WN18RR | 86,835 | 3,034 | 3,134 | 40,943 | 11 |
FB15K | 483,142 | 50,000 | 59,071 | 14,951 | 1,345 |
WN18 | 141,442 | 5,000 | 5,000 | 40,943 | 18 |
YAGO3-10 | 1,079,040 | 5,000 | 5,000 | 123,182 | 37 |
WN11 | 110,361 | 5,215 | 21,035 | 38,194 | 11 |
FB13 | 316,232 | 11,816 | 47,464 | 75,043 | 13 |
Warning
WN18 and FB15k include a large number of inverse relations, and its use in experiments has been deprecated. Use WN18RR and FB15K-237 instead.
Warning
FB15K-237’s validation set contains 8 unseen entities over 9 triples. The test set has 29 unseen entities, distributed over 28 triples. WN18RR’s validation set contains 198 unseen entities over 210 triples. The test set has 209 unseen entities, distributed over 210 triples.
Note
WN11 and FB13 also provide true and negative labels for the triples in the validation and tests sets. In both cases the positive base rate is close to 50%.
Benchmark Datasets Loaders (Knowledge Graphs With Numeric Values on Edges)¶
These helper functions load benchmark datasets with numeric values on edges, as described in [PC21] (the figure below shows a toy example).
Hint
To process a knowledge graphs with numeric values associated to edges, enable the FocusE layer when training a knowledge graph embedding model [PC21].
The functions will automatically download the datasets if they are not present in ~/ampligraph_datasets
or
at the location set in AMPLIGRAPH_DATA_HOME
.
load_onet20k ([check_md5hash, clean_unseen, …]) |
Load the O*NET20K dataset |
load_ppi5k ([check_md5hash, clean_unseen, …]) |
Load the PPI5K dataset |
load_nl27k ([check_md5hash, clean_unseen, …]) |
Load the NL27K dataset |
load_cn15k ([check_md5hash, clean_unseen, …]) |
Load the CN15K dataset |
Datasets Summary (KGs with numeric values on edges)
Dataset | Train | Valid | Test | Entities | Relations |
---|---|---|---|---|---|
O*NET20K | 461,932 | 138 | 2,000 | 20,643 | 19 |
PPI5K | 230,929 | 19,017 | 21,720 | 4,999 | 7 |
NL27K | 149,100 | 12,274 | 14,026 | 27,221 | 405 |
CN15K | 199,417 | 16,829 | 19,224 | 15,000 | 36 |
Loaders for Custom Knowledge Graphs¶
Functions to load custom knowledge graphs from disk.
load_from_csv (directory_path, file_name[, …]) |
Load a knowledge graph from a csv file |
load_from_ntriples (folder_name, file_name[, …]) |
Load RDF ntriples |
load_from_rdf (folder_name, file_name[, …]) |
Load an RDF file |
Hint
AmpliGraph includes a helper function to split a generic knowledge graphs into training,
validation, and test sets. See ampligraph.evaluation.train_test_split_no_unseen()
.