train_test_split_no_unseen¶

ampligraph.evaluation.train_test_split_no_unseen(X, test_size=100, seed=0, allow_duplication=False, filtered_test_predicates=None, backward_compatible=False)¶

Split into train and test sets.

This function carves out a test set that contains only entities and relations which also occur in the training set.

Parameters:

X (ndarray, size[n, 3]) – The dataset to split.
test_size (int, float) – If int, the number of triples in the test set. If float, the percentage of total triples.
seed (int) – A random seed used to split the dataset.
allow_duplication (boolean) – Flag to indicate if the test set can contain duplicated triples.
filtered_test_predicates (None, list) – If None, all predicate types will be considered for the test set. If list, only the predicate types in the list will be considered for the test set.
backward_compatible (boolean) – Uses the old (slower) version of the API for reproducibility of splits in older pipelines(if any) Avoid setting this to True, unless necessary. Set this flag only if you want to use the train_test_split_no_unseen of Ampligraph versions 1.3.2 and below. The older version is slow and inefficient

Returns:

X_train (ndarray, size[n, 3]) – The training set.
X_test (ndarray, size[n, 3]) – The test set.

Examples

>>> import numpy as np
>>> from ampligraph.evaluation import train_test_split_no_unseen
>>> # load your dataset to X
>>> X = np.array([['a', 'y', 'b'],
>>>               ['f', 'y', 'e'],
>>>               ['b', 'y', 'a'],
>>>               ['a', 'y', 'c'],
>>>               ['c', 'y', 'a'],
>>>               ['a', 'y', 'd'],
>>>               ['c', 'y', 'd'],
>>>               ['b', 'y', 'c'],
>>>               ['f', 'y', 'e']])
>>> # if you want to split into train/test datasets
>>> X_train, X_test = train_test_split_no_unseen(X, test_size=2)
>>> X_train
array([['a', 'y', 'd'],
   ['b', 'y', 'a'],
   ['a', 'y', 'c'],
   ['f', 'y', 'e'],
   ['a', 'y', 'b'],
   ['c', 'y', 'a'],
   ['b', 'y', 'c']], dtype='<U1')
>>> X_test
array([['f', 'y', 'e'],
   ['c', 'y', 'd']], dtype='<U1')
>>> # if you want to split into train/valid/test datasets, call it 2 times
>>> X_train_valid, X_test = train_test_split_no_unseen(X, test_size=2)
>>> X_train, X_valid = train_test_split_no_unseen(X_train_valid, test_size=2)
>>> X_train
array([['a', 'y', 'b'],
   ['a', 'y', 'd'],
   ['a', 'y', 'c'],
   ['c', 'y', 'a'],
   ['f', 'y', 'e']], dtype='<U1')
>>> X_valid
array([['c', 'y', 'd'],
   ['f', 'y', 'e']], dtype='<U1')
>>> X_test
array([['b', 'y', 'c'],
   ['b', 'y', 'a']], dtype='<U1')