train_test_split_no_unseen

ampligraph.evaluation.train_test_split_no_unseen(X, test_size=100, seed=0, allow_duplication=False)

Split into train and test sets.

This function carves out a test set that contains only entities and relations which also occur in the training set.

Parameters
  • X (ndarray, size[n, 3]) – The dataset to split.

  • test_size (int, float) – If int, the number of triples in the test set. If float, the percentage of total triples.

  • seed (int) – A random seed used to split the dataset.

  • allow_duplication (boolean) – Flag to indicate if the test set can contain duplicated triples.

Returns

  • X_train (ndarray, size[n, 3]) – The training set.

  • X_test (ndarray, size[n, 3]) – The test set.

Examples

>>> import numpy as np
>>> from ampligraph.evaluation import train_test_split_no_unseen
>>> # load your dataset to X
>>> X = np.array([['a', 'y', 'b'],
>>>               ['f', 'y', 'e'],
>>>               ['b', 'y', 'a'],
>>>               ['a', 'y', 'c'],
>>>               ['c', 'y', 'a'],
>>>               ['a', 'y', 'd'],
>>>               ['c', 'y', 'd'],
>>>               ['b', 'y', 'c'],
>>>               ['f', 'y', 'e']])
>>> # if you want to split into train/test datasets
>>> X_train, X_test = train_test_split_no_unseen(X, test_size=2)
>>> X_train
array([['a', 'y', 'b'],
    ['f', 'y', 'e'],
    ['b', 'y', 'a'],
    ['c', 'y', 'a'],
    ['c', 'y', 'd'],
    ['b', 'y', 'c'],
    ['f', 'y', 'e']], dtype='<U1')
>>> X_test
array([['a', 'y', 'c'],
    ['a', 'y', 'd']], dtype='<U1')
>>> # if you want to split into train/valid/test datasets, call it 2 times
>>> X_train_valid, X_test = train_test_split_no_unseen(X, test_size=2)
>>> X_train, X_valid = train_test_split_no_unseen(X_train_valid, test_size=2)
>>> X_train
array([['a', 'y', 'b'],
    ['b', 'y', 'a'],
    ['c', 'y', 'd'],
    ['b', 'y', 'c'],
    ['f', 'y', 'e']], dtype='<U1')
>>> X_valid
array([['f', 'y', 'e'],
    ['c', 'y', 'a']], dtype='<U1')
>>> X_test
array([['a', 'y', 'c'],
    ['a', 'y', 'd']], dtype='<U1')