generate_corruptions_for_fit

ampligraph.evaluation.generate_corruptions_for_fit(X, entities_list=None, eta=1, corrupt_side='s+o', entities_size=0, rnd=None)

Generate corruptions for training.

Creates corrupted triples for each statement in an array of statements, as described by [TWR+16].

Note

Collisions are not checked, as this will be computationally expensive [TWR+16]. That means that some corruptions may result in being positive statements (i.e. unfiltered settings).

Note

When processing large knowledge graphs, it may be useful to generate corruptions only using entities from a single batch. This also brings the benefit of creating more meaningful negatives, as entities used to corrupt are sourced locally. The function can be configured to generate corruptions only using the entities from the current batch. You can enable such behaviour be setting entities_size==-1. In such case, if entities_list=None all entities from the current batch will be used to generate corruptions.

Parameters:
  • X (Tensor, shape [n, 3]) – An array of positive triples that will be used to create corruptions.
  • entities_list (list) – List of entities to be used for generating corruptions. (default:None). if entities_list=None, all entities will be used to generate corruptions (default behaviour).
  • eta (int) – The number of corruptions per triple that must be generated.
  • corrupt_side (string) –

    Specifies which side of the triple to corrupt:

    • ’s’: corrupt only subject.
    • ’o’: corrupt only object
    • ’s+o’: corrupt both subject and object
  • entities_size (int) – Size of entities to be used while generating corruptions. It assumes entity id’s start from 0 and are continuous. (default: 0). When processing large knowledge graphs, it may be useful to generate corruptions only using entities from a single batch. This also brings the benefit of creating more meaningful negatives, as entities used to corrupt are sourced locally. The function can be configured to generate corruptions only using the entities from the current batch. You can enable such behaviour be setting entities_size==-1. In such case, if entities_list=None all entities from the current batch will be used to generate corruptions.
  • rnd (numpy.random.RandomState) – A random number generator.
Returns:

out – An array of corruptions for a list of positive triples x. For each row in X the corresponding corruption indexes can be found at [index+i*n for i in range(eta)]

Return type:

Tensor, shape [n * eta, 3]