load_from_csv

ampligraph.datasets.datasets.load_from_csv(directory_path, file_name, sep='\t', header=None, add_reciprocal_rels=False)

Load a knowledge graph from a .csv file.

Loads a knowledge graph serialized in a .csv file filtering duplicated statements. In the .csv file, each line has to represent a triple, and entities and relations are separated by sep. For instance, if sep="\t", the .csv file look like:

subj1    relationX   obj1
subj1    relationY   obj2
subj3    relationZ   obj2
subj4    relationY   obj2
           ...

Hint

To split a generic knowledge graphs into training, validation, and test sets do not use the above function, but rather train_test_split_no_unseen(): this will return validation and test sets not including triples with entities not present in the training set.

Parameters:
  • directory_path (str) – Folder where the input file is stored.

  • file_name (str) – File name.

  • sep (str) – The subject-predicate-object separator (default: "\t").

  • header (int or None) – The row of the header of the csv file. Same as pandas.read_csv header param.

  • add_reciprocal_rels (bool) – Flag which specifies whether to add reciprocal relations. For every <s, p, o> in the dataset this creates a corresponding triple with reciprocal relation <o, p_reciprocal, s> (default: False).

Returns:

triples – The actual triples of the file.

Return type:

ndarray, shape (n, 3)

Example

>>> PATH_TO_FOLDER = 'your/path/to/folder/'
>>> from ampligraph.datasets import load_from_csv
>>> X = load_from_csv(PATH_TO_FOLDER, 'dataset.csv', sep=',')
>>> X[:3]
array([['a', 'y', 'b'],
       ['b', 'y', 'a'],
       ['a', 'y', 'c']],
      dtype='<U1')