load_from_csv¶
- ampligraph.datasets.datasets.load_from_csv(directory_path, file_name, sep='\t', header=None, add_reciprocal_rels=False)¶
Load a knowledge graph from a .csv file.
Loads a knowledge graph serialized in a .csv file filtering duplicated statements. In the .csv file, each line has to represent a triple, and entities and relations are separated by
sep. For instance, ifsep="\t", the .csv file look like:subj1 relationX obj1 subj1 relationY obj2 subj3 relationZ obj2 subj4 relationY obj2 ...Hint
To split a generic knowledge graphs into training, validation, and test sets do not use the above function, but rather
train_test_split_no_unseen(): this will return validation and test sets not including triples with entities not present in the training set.- Parameters:
directory_path (str) – Folder where the input file is stored.
file_name (str) – File name.
sep (str) – The subject-predicate-object separator (default:
"\t").header (int or None) – The row of the header of the csv file. Same as pandas.read_csv header param.
add_reciprocal_rels (bool) – Flag which specifies whether to add reciprocal relations. For every <s, p, o> in the dataset this creates a corresponding triple with reciprocal relation <o, p_reciprocal, s> (default: False).
- Returns:
triples – The actual triples of the file.
- Return type:
ndarray, shape (n, 3)
Example
>>> PATH_TO_FOLDER = 'your/path/to/folder/' >>> from ampligraph.datasets import load_from_csv >>> X = load_from_csv(PATH_TO_FOLDER, 'dataset.csv', sep=',') >>> X[:3] array([['a', 'y', 'b'], ['b', 'y', 'a'], ['a', 'y', 'c']], dtype='<U1')