load_codex¶
- ampligraph.datasets.datasets.load_codex(check_md5hash=False, clean_unseen=True, add_reciprocal_rels=False, return_mapper=False)¶
Load the CoDEx-M dataset.
The dataset is described in [SK20].
Note
CODEX-M contains also ground truths negative triples for test and validation sets. For more information, see the above reference to the original paper.
The CodDEx dataset is loaded from file if it exists at the
AMPLIGRAPH_DATA_HOMElocation. IfAMPLIGRAPH_DATA_HOMEis not set, the default~/ampligraph_datasetsis checked. If the dataset is not found at either location, it is downloaded and placed inAMPLIGRAPH_DATA_HOMEor~/ampligraph_datasets.This dataset is divided in three splits:
train: 185,584 triples
valid: 10,310 triples
test: 10,310 triples
Both the validation and test splits are associated with labels (binary ndarrays), with True for positive statements and False for negatives:
valid_labels
test_labels
Dataset
Train
Valid
Valid-negatives
Test
Test-negatives
Entities
Relations
CoDEx-M
185,584
10,310
10,310
10311
10311
17,050
51
- Parameters:
clean_unseen (bool) – If True, filters triples in validation and test sets that include entities not present in the training set.
check_md5hash (bool) – If True, check the md5hash of the datset files (default: False).
add_reciprocal_rels (bool) – Flag which specifies whether to add reciprocal relations. For every <s, p, o> in the dataset this creates a corresponding triple with reciprocal relation <o, p_reciprocal, s> (default: False).
return_mapper (bool) – Whether to return human-readable labels in a form of dictionary in
X["mapper"]field (default: False).
- Returns:
splits – The dataset splits: {‘train’: train, ‘valid’: valid, ‘valid_negatives’: valid_negatives’, ‘test’: test, ‘test_negatives’: test_negatives}. Each split is a ndarray of shape (n, 3).
- Return type:
dict
Example
>>> from ampligraph.datasets import load_codex >>> X = load_codex() >>> X["valid"][0] array(['Q60684', 'P106', 'Q4964182'], dtype=object) >>> X = load_codex(return_mapper=True) >>> [X['mapper'][elem]['label'] for elem in X['valid'][0]] ['Novalis', 'occupation', 'philosopher']