BucketGraphPartitioner¶

class ampligraph.datasets.BucketGraphPartitioner(data, k=2, **kwargs)¶

Bucket-based partition strategy.

This strategy first splits entities into \(k\) buckets and creates:

k partitions where the i-th includes triples such that subject and object belong to the i-th partition.
\(\frac{(k^2-k)}{2}\) partitions indexed by \((i,j)\) with \(i,j=1,...,k\), \(i \neq j\) where the \((i,j)\)-th partition contains triples such that the subject belongs to the \(i\)-th partition and the object to the \(j\)-th partition or viceversa.

Example

>>> from ampligraph.datasets import load_fb15k_237, GraphDataLoader, BucketGraphPartitioner
>>> from ampligraph.datasets.sqlite_adapter import SQLiteAdapter
>>> from ampligraph.latent_features import ScoringBasedEmbeddingModel
>>> dataset = load_fb15k_237()
>>> dataset_loader = GraphDataLoader(dataset['train'],
>>>                                  backend=SQLiteAdapter, # Type of backend to use
>>>                                  batch_size=1000,       # Batch size to use while iterating over the dataset
>>>                                  dataset_type='train',  # Dataset type
>>>                                  use_filter=False,      # Whether to use filter or not
>>>                                  use_indexer=True)      # indicates that the data needs to be mapped to index
>>> partitioner = BucketGraphPartitioner(dataset_loader, k=2)
>>> # create and compile a model as usual
>>> partitioned_model = ScoringBasedEmbeddingModel(eta=2, k=50, scoring_type='DistMult')
>>> partitioned_model.compile(optimizer='adam', loss='multiclass_nll')
>>> partitioned_model.fit(partitioner,       # The partitioner object generate data for the model during training
>>>                       epochs=10)         # Number of epochs

Example

>>> import numpy as np
>>> from ampligraph.datasets import GraphDataLoader, BucketGraphPartitioner
>>> d = np.array([[1,1,2], [1,1,3],[1,1,4],[5,1,3],[5,1,2],[6,1,3],[6,1,2],[6,1,4],[6,1,7]])
>>> data = GraphDataLoader(d, batch_size=1, dataset_type="test")
>>> partitioner = BucketGraphPartitioner(data, k=2)
>>> for i, partition in enumerate(partitioner):
>>>    print("partition ", i)
>>>    for batch in partition:
>>>        print(batch)
partition  0
[['0,0,1']]
[['0,0,2']]
[['0,0,3']]
partition  1
[['4,0,1']]
[['4,0,2']]
[['5,0,1']]
[['5,0,2']]
[['5,0,3']]
partition  2
[['5,0,6']]

Attributes

`manager`
`name`

Methods

`__init__`(data[, k])	Initialise the BucketGraphPartitioner.
`create_single_partition`(ind1, ind2, ...[, ...])	Creates partition based on the two given indices of buckets.

__init__(data, k=2, **kwargs)¶

Initialise the BucketGraphPartitioner.

Parameters:

data (GraphDataLoader) – Input data as a GraphDataLoader.
k (int) – Number of buckets to split entities (i.e., vertices) into.

create_single_partition(ind1, ind2, timestamp, partition_nb, batch_size=1)¶

Creates partition based on the two given indices of buckets.

It appends created partition to the list of partitions (self.partitions).

Parameters:

ind1 (int) – Index of the first bucket needed to create partition.
ind2 (int) – Index of the second bucket needed to create partition.
timestamp (str) – Date and time string that the files are created with (shelves).
partition_nb (int) – Assigned number of partitions.