Datasets¶
Support for loading and managing datasets.
Loaders for Custom Knowledge Graphs¶
These are functions to load custom knowledge graphs from disk. They load the data from the specified files and store it
as a numpy array. These loaders are recommended when the datasets to load are small in size (approx 1M entities and
millions of triples), i.e., as long as they can be stored in memory. In case the dataset is too big to fit in memory,
use the GraphDataLoader class instead (see the
Advanced Topics section for more).
|
Load a knowledge graph from a .csv file. |
|
Load a dataset of RDF ntriples. |
|
Load an RDF file. |
Benchmark Datasets Loaders¶
|
Load the FB15k-237 dataset (with option to load human labeled test subset). |
|
Load the WN18RR dataset. |
|
Load the YAGO3-10 dataset. |
|
Load the WordNet11 (WN11) dataset. |
|
Load the Freebase13 (FB13) dataset. |
|
Load the CoDEx-M dataset. |
|
Load the FB15k dataset. |
|
Load the WN18 dataset. |
Datasets Summary
Dataset |
Train |
Valid |
Test |
Entities |
Relations |
|---|---|---|---|---|---|
FB15K-237 |
272,115 |
17,535 |
20,466 |
14,541 |
237 |
WN18RR |
86,835 |
3,034 |
3,134 |
40,943 |
11 |
YAGO3-10 |
1,079,040 |
5,000 |
5,000 |
123,182 |
37 |
WN11 |
110,361 |
5,215 |
21,035 |
38,194 |
11 |
FB13 |
316,232 |
11,816 |
47,464 |
75,043 |
13 |
CODEX-M |
185,584 |
10,310 |
10,311 |
17,050 |
51 |
FB15K |
483,142 |
50,000 |
59,071 |
14,951 |
1,345 |
WN18 |
141,442 |
5,000 |
5,000 |
40,943 |
18 |
Hint
It is recommended to set the AMPLIGRAPH_DATA_HOME environment variable:
export AMPLIGRAPH_DATA_HOME=/YOUR/PATH/TO/datasets
AMPLIGRAPH_DATA_HOME is set. If so, it will search this location for the required dataset. If not, the dataset will be downloaded and placed in this directory.AMPLIGRAPH_DATA_HOME is not set, the datasets will be saved in the ~/ampligraph_datasets directory.Warning
Benchmark Datasets With Numeric Values on Edges Loader¶
These helper functions load benchmark datasets with numeric values on edges (the figure below shows a toy example). Numeric values associated to edges of a knowledge graph have been used to represent uncertainty, edge importance, and even out-of-band knowledge in a growing number of scenarios, ranging from genetic data to social networks.
Hint
To process a knowledge graphs with numeric values associated to edges, enable the FocusE layer [PC21] when training a knowledge graph embedding model.
The functions will automatically download the datasets if they are not present in ~/ampligraph_datasets or
at the location set in the AMPLIGRAPH_DATA_HOME.
|
Load the O*NET20K dataset. |
|
Load the PPI5K dataset. |
|
Load the NL27K dataset. |
|
Load the CN15K dataset. |
Datasets Summary
Dataset |
Train |
Valid |
Test |
Entities |
Relations |
|---|---|---|---|---|---|
O*NET20K |
461,932 |
138 |
2,000 |
20,643 |
19 |
PPI5K |
230,929 |
19,017 |
21,720 |
4,999 |
7 |
NL27K |
149,100 |
12,274 |
14,026 |
27,221 |
405 |
CN15K |
199,417 |
16,829 |
19,224 |
15,000 |
36 |