BucketGraphPartitioner¶
- class ampligraph.datasets.BucketGraphPartitioner(data, k=2, **kwargs)¶
Bucket-based partition strategy.
This strategy first splits entities into \(k\) buckets and creates:
k partitions where the i-th includes triples such that subject and object belong to the i-th partition.
\(\frac{(k^2-k)}{2}\) partitions indexed by \((i,j)\) with \(i,j=1,...,k\), \(i \neq j\) where the \((i,j)\)-th partition contains triples such that the subject belongs to the \(i\)-th partition and the object to the \(j\)-th partition or viceversa.
Example
>>> from ampligraph.datasets import load_fb15k_237, GraphDataLoader, BucketGraphPartitioner >>> from ampligraph.datasets.sqlite_adapter import SQLiteAdapter >>> from ampligraph.latent_features import ScoringBasedEmbeddingModel >>> dataset = load_fb15k_237() >>> dataset_loader = GraphDataLoader(dataset['train'], >>> backend=SQLiteAdapter, # Type of backend to use >>> batch_size=1000, # Batch size to use while iterating over the dataset >>> dataset_type='train', # Dataset type >>> use_filter=False, # Whether to use filter or not >>> use_indexer=True) # indicates that the data needs to be mapped to index >>> partitioner = BucketGraphPartitioner(dataset_loader, k=2) >>> # create and compile a model as usual >>> partitioned_model = ScoringBasedEmbeddingModel(eta=2, k=50, scoring_type='DistMult') >>> partitioned_model.compile(optimizer='adam', loss='multiclass_nll') >>> partitioned_model.fit(partitioner, # The partitioner object generate data for the model during training >>> epochs=10) # Number of epochs
Example
>>> import numpy as np >>> from ampligraph.datasets import GraphDataLoader, BucketGraphPartitioner >>> d = np.array([[1,1,2], [1,1,3],[1,1,4],[5,1,3],[5,1,2],[6,1,3],[6,1,2],[6,1,4],[6,1,7]]) >>> data = GraphDataLoader(d, batch_size=1, dataset_type="test") >>> partitioner = BucketGraphPartitioner(data, k=2) >>> for i, partition in enumerate(partitioner): >>> print("partition ", i) >>> for batch in partition: >>> print(batch) partition 0 [['0,0,1']] [['0,0,2']] [['0,0,3']] partition 1 [['4,0,1']] [['4,0,2']] [['5,0,1']] [['5,0,2']] [['5,0,3']] partition 2 [['5,0,6']]
Attributes
managernameMethods
__init__(data[, k])Initialise the BucketGraphPartitioner.
create_single_partition(ind1, ind2, ...[, ...])Creates partition based on the two given indices of buckets.
- __init__(data, k=2, **kwargs)¶
Initialise the BucketGraphPartitioner.
- Parameters:
data (GraphDataLoader) – Input data as a GraphDataLoader.
k (int) – Number of buckets to split entities (i.e., vertices) into.
- create_single_partition(ind1, ind2, timestamp, partition_nb, batch_size=1)¶
Creates partition based on the two given indices of buckets.
It appends created partition to the list of partitions (self.partitions).
- Parameters:
ind1 (int) – Index of the first bucket needed to create partition.
ind2 (int) – Index of the second bucket needed to create partition.
timestamp (str) – Date and time string that the files are created with (shelves).
partition_nb (int) – Assigned number of partitions.