load_fb13¶
- ampligraph.datasets.load_fb13(check_md5hash=False, clean_unseen=True, add_reciprocal_rels=False)¶
Load the Freebase13 (FB13) dataset.
FB13 is a subset of Freebase [BEP+08] and was initially presented in Reasoning With Neural Tensor Networks for Knowledge Base Completion [SCMN13].
Note
FB13 also provide true and negative labels for the triples in the validation and tests sets. The positive base rate is close to 50%.
FB13 dataset is loaded from file if it exists at the
AMPLIGRAPH_DATA_HOMElocation. IfAMPLIGRAPH_DATA_HOMEis not set, the default~/ampligraph_datasetsis checked. If the dataset is not found at either location, it is downloaded and placed inAMPLIGRAPH_DATA_HOMEor~/ampligraph_datasets.This dataset is divided in three splits:
train: 316232 triples
valid: 11816 triples
test: 47464 triples
Both the validation and test splits are associated with labels (binary ndarrays), with True for positive statements and False for negatives:
valid_labels
test_labels
Dataset
Train
Valid Pos
Valid Neg
Test Pos
Test Neg
Entities
Relations
FB13
316232
5908
5908
23733
23731
75043
13
- Parameters:
check_md5hash (bool) – If True check the md5hash of the files (default: False).
clean_unseen (bool) – If True, filters triples in validation and test sets that include entities not present in the training set.
add_reciprocal_rels (bool) – Flag which specifies whether to add reciprocal relations. For every <s, p, o> in the dataset this creates a corresponding triple with reciprocal relation <o, p_reciprocal, s> (default: False).
- Returns:
splits – The dataset splits: {‘train’: train, ‘valid’: valid, ‘valid_labels’: valid_labels, ‘test’: test, ‘test_labels’: test_labels}. Each split containing a dataset is a ndarray of shape (n, 3). The labels are ndarray of shape (n).
- Return type:
dict
Example
>>> from ampligraph.datasets import load_fb13 >>> X = load_fb13() >>> X["valid"][0] array(['cornelie_van_zanten', 'gender', 'female'], dtype=object) >>> X["valid_labels"][0:3] array([True, False, True], dtype=object)