load_wn11¶
-
ampligraph.datasets.
load_wn11
(check_md5hash=False, clean_unseen=True)¶ Load the WordNet11 (WN11) dataset
WordNet was originally proposed in WordNet: a lexical database for English [Mil95].
WN11 dataset is loaded from file if it exists at the
AMPLIGRAPH_DATA_HOME
location. IfAMPLIGRAPH_DATA_HOME
is not set the the default~/ampligraph_datasets
is checked.If the dataset is not found at either location, it is downloaded and placed in
AMPLIGRAPH_DATA_HOME
or~/ampligraph_datasets
.It is divided in three splits:
train
valid
test
Both the validation and test splits are associated with labels (binary ndarrays), with True for positive statements and False for negatives:
valid_labels
test_labels
Dataset
Train
Valid Pos
Valid Neg
Test Pos
Test Neg
Entities
Relations
WN11
110361
2606
2609
10493
10542
38588
11
- Parameters
check_md5hash (boolean) – If
True
check the md5hash of the files. Defaults toFalse
.clean_unseen (bool) – If
True
, filters triples in validation and test sets that include entities not present in the training set.
- Returns
splits – The dataset splits: {‘train’: train, ‘valid’: valid, ‘valid_labels’: valid_labels, ‘test’: test, ‘test_labels’: test_labels}. Each split containing a dataset is an ndarray of shape [n, 3]. The labels are ndarray of shape [n].
- Return type
dict
Examples
>>> from ampligraph.datasets import load_wn11 >>> X = load_wn11() >>> X["valid"][0] array(['__genus_xylomelum_1', '_type_of', '__dicot_genus_1'], dtype=object) >>> X["valid_labels"][0:3] array([ True, False, True])