generate_corruptions_for_fit

ampligraph.evaluation.generate_corruptions_for_fit(X, entities_list=None, eta=1, corrupt_side='s+o', entities_size=0, rnd=None)

Generate corruptions for training.

Creates corrupted triples for each statement in an array of statements, as described by [TWR+16].

Note

Collisions are not checked, as this will be computationally expensive [TWR+16]. That means that some corruptions may result in being positive statements (i.e. unfiltered settings).

Note

When processing large knowledge graphs, it may be useful to generate corruptions only using entities from a single batch. This also brings the benefit of creating more meaningful negatives, as entities used to corrupt are sourced locally. The function can be configured to generate corruptions only using the entities from the current batch. You can enable such behaviour be setting entities_size=0. In such case, if entities_list=None all entities from the current batch will be used to generate corruptions.

Parameters
  • X (Tensor, shape [n, 3]) – An array of positive triples that will be used to create corruptions.

  • entities_list (list) –

    List of entities to be used for generating corruptions. (default:None).

    If entities_list=None and entities_size is the number of all entities, all entities will be used to generate corruptions (default behaviour).

    If entities_list=None and entities_size=0, the batch entities will be used to generate corruptions.

  • eta (int) – The number of corruptions per triple that must be generated.

  • corrupt_side (string) –

    Specifies which side of the triple to corrupt:

    • ’s’: corrupt only subject.

    • ’o’: corrupt only object

    • ’s+o’: corrupt both subject and object

  • entities_size (int) – Size of entities to be used while generating corruptions. It assumes entity id’s start from 0 and are continuous. (default: 0). When processing large knowledge graphs, it may be useful to generate corruptions only using entities from a single batch. This also brings the benefit of creating more meaningful negatives, as entities used to corrupt are sourced locally. The function can be configured to generate corruptions only using the entities from the current batch. You can enable such behaviour be setting entities_size=0. In such case, if entities_list=None all entities from the current batch will be used to generate corruptions.

  • rnd (numpy.random.RandomState) – A random number generator.

Returns

out – An array of corruptions for a list of positive triples X. For each row in X the corresponding corruption indexes can be found at [index+i*n for i in range(eta)]

Return type

Tensor, shape [n * eta, 3]