load_from_csv

ampligraph.datasets.load_from_csv(directory_path, file_name, sep='\t', header=None)

Load a knowledge graph from a csv file

Loads a knowledge graph serialized in a csv file as:

subj1    relationX   obj1
subj1    relationY   obj2
subj3    relationZ   obj2
subj4    relationY   obj2
...

Note

The function filters duplicated statements.

Note

It is recommended to use ampligraph.evaluation.train_test_split_no_unseen() to split custom knowledge graphs into train, validation, and test sets. Using this function will lead to validation, test sets that do not include triples with entities that do not occur in the training set.

Parameters:
  • directory_path (str) – folder where the input file is stored.
  • file_name (str) – file name
  • sep (str) – The subject-predicate-object separator (default ).
  • header (int, None) – The row of the header of the csv file. Same as pandas.read_csv header param.
Returns:

triples – the actual triples of the file.

Return type:

ndarray , shape [n, 3]

Examples

>>> from ampligraph.datasets import load_from_csv
>>> X = load_from_csv('folder', 'dataset.csv', sep=',')
>>> X[:3]
array([['a', 'y', 'b'],
       ['b', 'y', 'a'],
       ['a', 'y', 'c']],
      dtype='<U1')