train_test_split_no_unseen¶
-
ampligraph.evaluation.
train_test_split_no_unseen
(X, test_size=100, seed=0, allow_duplication=False)¶ Split into train and test sets.
This function carves out a test set that contains only entities and relations which also occur in the training set.
- Parameters
X (ndarray, size[n, 3]) – The dataset to split.
test_size (int, float) – If int, the number of triples in the test set. If float, the percentage of total triples.
seed (int) – A random seed used to split the dataset.
allow_duplication (boolean) – Flag to indicate if the test set can contain duplicated triples.
- Returns
X_train (ndarray, size[n, 3]) – The training set.
X_test (ndarray, size[n, 3]) – The test set.
Examples
>>> import numpy as np >>> from ampligraph.evaluation import train_test_split_no_unseen >>> # load your dataset to X >>> X = np.array([['a', 'y', 'b'], >>> ['f', 'y', 'e'], >>> ['b', 'y', 'a'], >>> ['a', 'y', 'c'], >>> ['c', 'y', 'a'], >>> ['a', 'y', 'd'], >>> ['c', 'y', 'd'], >>> ['b', 'y', 'c'], >>> ['f', 'y', 'e']]) >>> # if you want to split into train/test datasets >>> X_train, X_test = train_test_split_no_unseen(X, test_size=2) >>> X_train array([['a', 'y', 'b'], ['f', 'y', 'e'], ['b', 'y', 'a'], ['c', 'y', 'a'], ['c', 'y', 'd'], ['b', 'y', 'c'], ['f', 'y', 'e']], dtype='<U1') >>> X_test array([['a', 'y', 'c'], ['a', 'y', 'd']], dtype='<U1') >>> # if you want to split into train/valid/test datasets, call it 2 times >>> X_train_valid, X_test = train_test_split_no_unseen(X, test_size=2) >>> X_train, X_valid = train_test_split_no_unseen(X_train_valid, test_size=2) >>> X_train array([['a', 'y', 'b'], ['b', 'y', 'a'], ['c', 'y', 'd'], ['b', 'y', 'c'], ['f', 'y', 'e']], dtype='<U1') >>> X_valid array([['f', 'y', 'e'], ['c', 'y', 'a']], dtype='<U1') >>> X_test array([['a', 'y', 'c'], ['a', 'y', 'd']], dtype='<U1')