load_cn15k¶
-
ampligraph.datasets.
load_cn15k
(check_md5hash=False, clean_unseen=True, split_test_into_top_bottom=True, split_threshold=0.1)¶ Load the CN15K dataset
CN15K was originally proposed in [CCS+19], it is a subset of ConceptNet [SCH16], a common-sense knowledge graph built to represent general human knowledge. Numeric values on triples represent uncertainty.
CN15k dataset is loaded from file if it exists at the
AMPLIGRAPH_DATA_HOME
location. IfAMPLIGRAPH_DATA_HOME
is not set the the default~/ampligraph_datasets
is checked.If the dataset is not found at either location, it is downloaded and placed in
AMPLIGRAPH_DATA_HOME
or~/ampligraph_datasets
.It is divided into three splits:
train
valid
test
Each triple in these splits is associated to a numeric value which represents the importance/relevance of the link.
Dataset Train Valid Test Entities Relations CN15K 199417 16829 19224 15000 36 Parameters: - check_md5hash (boolean) – If
True
check the md5hash of the files. Defaults toFalse
. - clean_unseen (bool) – If
True
, filters triples in validation and test sets that include entities not present in the training set. - split_test_into_top_bottom (bool) – Splits the test set by numeric values and returns test_top_split and test_bottom_split by splitting based on sorted numeric values and returning top and bottom k% triples, where k is specified by split_threshold argument
- split_threshold (float) – specifies the top and bottom percentage of triples to return
Returns: splits – The dataset splits: {‘train’: train, ‘valid’: valid, ‘test’: test, ‘test_topk’: test_topk, ‘test_bottomk’: test_bottomk, ‘train_numeric_values’: train_numeric_values, ‘valid_numeric_values’:valid_numeric_values, ‘test_numeric_values’: test_numeric_values, ‘test_topk_numeric_values’: test_topk_numeric_values, ‘test_bottomk_numeric_values’: test_bottomk_numeric_values}.
Each
*_numeric_values
split contains numeric values associated to the corresponding dataset split and is a ndarray of shape [n].Each dataset split is a ndarray of shape [n,3].
The
*_topk
and*_bottomk
splits are only returned whensplit_test_into_top_bottom=True
.Return type: dict
Examples
>>> from ampligraph.datasets import load_cn15k >>> X = load_cn15k()