ampligraph.datasets.load_from_csv(directory_path, file_name, sep='\t', header=None, add_reciprocal_rels=False)

Load a knowledge graph from a csv file

Loads a knowledge graph serialized in a csv file as:

subj1    relationX   obj1
subj1    relationY   obj2
subj3    relationZ   obj2
subj4    relationY   obj2


The function filters duplicated statements.


It is recommended to use ampligraph.evaluation.train_test_split_no_unseen() to split custom knowledge graphs into train, validation, and test sets. Using this function will lead to validation, test sets that do not include triples with entities that do not occur in the training set.

  • directory_path (str) – Folder where the input file is stored.
  • file_name (str) – File name.
  • sep (str) – The subject-predicate-object separator (default ).
  • header (int, None) – The row of the header of the csv file. Same as pandas.read_csv header param.
  • add_reciprocal_rels (bool) – Flag which specifies whether to add reciprocal relations. For every <s, p, o> in the dataset this creates a corresponding triple with reciprocal relation <o, p_reciprocal, s>. (default: False)

triples – The actual triples of the file.

Return type:

ndarray , shape [n, 3]


>>> from ampligraph.datasets import load_from_csv
>>> X = load_from_csv('folder', 'dataset.csv', sep=',')
>>> X[:3]
array([['a', 'y', 'b'],
       ['b', 'y', 'a'],
       ['a', 'y', 'c']],