load_fb13

ampligraph.datasets.load_fb13(check_md5hash=False, clean_unseen=True, add_reciprocal_rels=False)

Load the Freebase13 (FB13) dataset

FB13 is a subset of Freebase [BEP+08] and was initially presented in Reasoning With Neural Tensor Networks for Knowledge Base Completion [SCMN13].

FB13 dataset is loaded from file if it exists at the AMPLIGRAPH_DATA_HOME location. If AMPLIGRAPH_DATA_HOME is not set the the default ~/ampligraph_datasets is checked.

If the dataset is not found at either location, it is downloaded and placed in AMPLIGRAPH_DATA_HOME or ~/ampligraph_datasets.

It is divided in three splits:

  • train

  • valid

  • test

Both the validation and test splits are associated with labels (binary ndarrays), with True for positive statements and False for negatives:

  • valid_labels

  • test_labels

Dataset

Train

Valid Pos

Valid Neg

Test Pos

Test Neg

Entities

Relations

FB13

316232

5908

5908

23733

23731

75043

13

Parameters
  • check_md5hash (boolean) – If True check the md5hash of the files. Defaults to False.

  • clean_unseen (bool) – If True, filters triples in validation and test sets that include entities not present in the training set.

  • add_reciprocal_rels (bool) – Flag which specifies whether to add reciprocal relations. For every <s, p, o> in the dataset this creates a corresponding triple with reciprocal relation <o, p_reciprocal, s>. (default: False).

Returns

splits – The dataset splits: {‘train’: train, ‘valid’: valid, ‘valid_labels’: valid_labels, ‘test’: test, ‘test_labels’: test_labels}. Each split containing a dataset is an ndarray of shape [n, 3]. The labels are ndarray of shape [n].

Return type

dict

Examples

>>> from ampligraph.datasets import load_fb13
>>> X = load_fb13()
>>> X["valid"][0]
array(['cornelie_van_zanten', 'gender', 'female'], dtype=object)
>>> X["valid_labels"][0:3]
array([True, False, True], dtype=object)