load_wn11

ampligraph.datasets.datasets.load_wn11(check_md5hash=False, clean_unseen=True, add_reciprocal_rels=False)

Load the WordNet11 (WN11) dataset.

WordNet was originally proposed in WordNet: a lexical database for English [Mil95].

Note

WN11 also provide true and negative labels for the triples in the validation and tests sets. The positive base rate is close to 50%.

WN11 dataset is loaded from file if it exists at the AMPLIGRAPH_DATA_HOME location. If AMPLIGRAPH_DATA_HOME is not set, the default ~/ampligraph_datasets is checked. If the dataset is not found at either location, it is downloaded and placed in AMPLIGRAPH_DATA_HOME or ~/ampligraph_datasets.

This dataset is divided in three splits:

  • train: 110361 triples

  • valid: 5215 triples

  • test: 21035 triples

Both the validation and test splits are associated with labels (binary ndarrays), with True for positive statements and False for negatives:

  • valid_labels

  • test_labels

Dataset

Train

Valid Pos

Valid Neg

Test Pos

Test Neg

Entities

Relations

WN11

110361

2606

2609

10493

10542

38588

11

Parameters:
  • check_md5hash (bool) – If True check the md5hash of the files (default: False).

  • clean_unseen (bool) – If True, filters triples in validation and test sets that include entities not present in the training set.

  • add_reciprocal_rels (bool) – Flag which specifies whether to add reciprocal relations. For every <s, p, o> in the dataset this creates a corresponding triple with reciprocal relation <o, p_reciprocal, s> (default: False).

Returns:

splits – The dataset splits: {‘train’: train, ‘valid’: valid, ‘valid_labels’: valid_labels, ‘test’: test, ‘test_labels’: test_labels}. Each split containing a dataset is a ndarray of shape (n, 3). The labels are a ndarray of shape (n).

Return type:

dict

Example

>>> from ampligraph.datasets import load_wn11
>>> X = load_wn11()
>>> X["valid"][0]
array(['__genus_xylomelum_1', '_type_of', '__dicot_genus_1'], dtype=object)
>>> X["valid_labels"][0:3]
array([ True, False,  True])