BucketGraphPartitioner

class ampligraph.datasets.BucketGraphPartitioner(data, k=2, **kwargs)

Bucket-based partition strategy.

This strategy first splits entities into \(k\) buckets and creates:

  • k partitions where the i-th includes triples such that subject and object belong to the i-th partition.

  • \(\frac{(k^2-k)}{2}\) partitions indexed by \((i,j)\) with \(i,j=1,...,k\), \(i \neq j\) where the \((i,j)\)-th partition contains triples such that the subject belongs to the \(i\)-th partition and the object to the \(j\)-th partition or viceversa.

Example

>>> from ampligraph.datasets import load_fb15k_237, GraphDataLoader, BucketGraphPartitioner
>>> from ampligraph.datasets.sqlite_adapter import SQLiteAdapter
>>> from ampligraph.latent_features import ScoringBasedEmbeddingModel
>>> dataset = load_fb15k_237()
>>> dataset_loader = GraphDataLoader(dataset['train'],
>>>                                  backend=SQLiteAdapter, # Type of backend to use
>>>                                  batch_size=1000,       # Batch size to use while iterating over the dataset
>>>                                  dataset_type='train',  # Dataset type
>>>                                  use_filter=False,      # Whether to use filter or not
>>>                                  use_indexer=True)      # indicates that the data needs to be mapped to index
>>> partitioner = BucketGraphPartitioner(dataset_loader, k=2)
>>> # create and compile a model as usual
>>> partitioned_model = ScoringBasedEmbeddingModel(eta=2, k=50, scoring_type='DistMult')
>>> partitioned_model.compile(optimizer='adam', loss='multiclass_nll')
>>> partitioned_model.fit(partitioner,       # The partitioner object generate data for the model during training
>>>                       epochs=10)         # Number of epochs

Example

>>> import numpy as np
>>> from ampligraph.datasets import GraphDataLoader, BucketGraphPartitioner
>>> d = np.array([[1,1,2], [1,1,3],[1,1,4],[5,1,3],[5,1,2],[6,1,3],[6,1,2],[6,1,4],[6,1,7]])
>>> data = GraphDataLoader(d, batch_size=1, dataset_type="test")
>>> partitioner = BucketGraphPartitioner(data, k=2)
>>> for i, partition in enumerate(partitioner):
>>>    print("partition ", i)
>>>    for batch in partition:
>>>        print(batch)
partition  0
[['0,0,1']]
[['0,0,2']]
[['0,0,3']]
partition  1
[['4,0,1']]
[['4,0,2']]
[['5,0,1']]
[['5,0,2']]
[['5,0,3']]
partition  2
[['5,0,6']]

Attributes

manager

name

Methods

__init__(data[, k])

Initialise the BucketGraphPartitioner.

create_single_partition(ind1, ind2, ...[, ...])

Creates partition based on the two given indices of buckets.

__init__(data, k=2, **kwargs)

Initialise the BucketGraphPartitioner.

Parameters:
  • data (GraphDataLoader) – Input data as a GraphDataLoader.

  • k (int) – Number of buckets to split entities (i.e., vertices) into.

create_single_partition(ind1, ind2, timestamp, partition_nb, batch_size=1)

Creates partition based on the two given indices of buckets.

It appends created partition to the list of partitions (self.partitions).

Parameters:
  • ind1 (int) – Index of the first bucket needed to create partition.

  • ind2 (int) – Index of the second bucket needed to create partition.

  • timestamp (str) – Date and time string that the files are created with (shelves).

  • partition_nb (int) – Assigned number of partitions.