l2gv2.patch package#
Submodules#
l2gv2.patch.clustering module#
Graph clustering algorithms
This module contains functions for clustering graphs.
TODO: - Make the algorithms work with Graph objects
- class l2gv2.patch.clustering.Partition(partition_tensor)#
Bases:
SequenceDefines a partition of a graph
- l2gv2.patch.clustering.fennel_clustering(graph: Graph, num_clusters, load_limit=1.1, alpha=None, gamma=1.5, num_iters=1, clusters=None)#
- l2gv2.patch.clustering.hierarchical_aglomerative_clustering(graph: ~raphtory.Graph, method=<function spread_clustering>, levels=None, branch_factors=None)#
Hierarchical agglomerative clustering
Implements a hierarchical agglomerative clustering algorithm.
- Parameters:
graph – input graph
method – clustering method
levels – number of levels
branch_factors – branch factors
- Returns:
list of clusters
- l2gv2.patch.clustering.hierarchical_clustering(data: Data, m: int, k: int, clustering_function: Callable[[Data, int], Tensor]) list[Tensor]#
Perform hierarchical clustering on a PyTorch Geometric graph.
- Parameters:
data (Data) – The input PyTorch Geometric graph.
m (int) – Target number of clusters.
k (int) – Target maximum cluster size.
clustering_function (Callable) – A function that takes a Data object and the number of clusters m and returns a cluster assignment tensor.
- Returns:
A list of cluster assignment tensors for all levels of the hierarchy.
- Return type:
List[torch.Tensor]
- l2gv2.patch.clustering.louvain_clustering(graph: Graph, *args, **kwargs)#
Implements clustering using the Louvain [1] algorithm for modularity optimisation
- Parameters:
graph – input graph
- Returns:
partition tensor
This is a minimal wrapper around
community.best_partition()from the python-louvain package. Any other arguments provided are passed through.References
- l2gv2.patch.clustering.metis_clustering(graph: TGraph, num_clusters)#
Implements clustering using metis
- Parameters:
graph – input graph
num_clusters – number of cluster
- Returns:
partition tensor
This uses the pymetis package
References
[2] “A Fast and Highly Quality Multilevel Scheme for Partitioning Irregular Graphs”. George Karypis and Vipin Kumar. SIAM Journal on Scientific Computing, Vol. 20, No. 1, pp. 359—392, 1999.
- l2gv2.patch.clustering.spread_clustering(graph: Graph, num_clusters, max_degree_init=True)#
l2gv2.patch.lazy module#
Lazy coordinates for patch embeddings
- class l2gv2.patch.lazy.BaseLazyCoordinates#
Bases:
ABCTODO: docstring for BaseLazyCoordinates
- abstract property shape#
Shape of the coordinates
- class l2gv2.patch.lazy.LazyCoordinates(x, shift=None, scale=None, rot=None)#
Bases:
BaseLazyCoordinatesTODO: doctring for LazyCoordinates
- save_transform(filename)#
Save the transformation to a file
- Parameters:
filename – filename to save the transformation to
- property shape#
Shape of the coordinates
- class l2gv2.patch.lazy.LazyFileCoordinates(filename, *args, **kwargs)#
Bases:
LazyCoordinatesTODO: doctring for LazyFileCoordinates
- property shape#
Shape of the coordinates
- class l2gv2.patch.lazy.LazyMeanAggregatorCoordinates(patches)#
Bases:
BaseLazyCoordinatesTODO: doctring for LazyMeanAggregatorCoordinates
- as_array(out=None)#
TODO: docstring for as_array
- Parameters:
out – [description], defaults to None
- get_coordinates(nodes, out=None)#
TODO: docstring for get_coordinates
- Parameters:
nodes – [description]
out – [description], default is None
- property shape#
Shape of the coordinates
l2gv2.patch.patches module#
Dividing input data into overlapping patches
This module contains functions for dividing input data into overlapping patches.
The main function is create_patch_data, which takes a graph and a partition of the graph and returns a list of patches.
- class l2gv2.patch.patches.FilePatch(nodes, filename, shift=None, scale=None, rot=None)#
Bases:
PatchPatch class that loads coordinates from a file with optional transformations (shift, scale, rotation).
- class l2gv2.patch.patches.MeanAggregatorPatch(patches)#
Bases:
PatchPatch class that aggregates multiple patches by taking their mean coordinates.
- get_coordinate(node)#
get coordinate for a single node
- Parameters:
node – Integer node index
- get_coordinates(nodes)#
get coordinates for a list of nodes
- Parameters:
nodes – Iterable of node indices
- property patches#
- class l2gv2.patch.patches.Patch(nodes, coordinates=None)#
Bases:
objectClass for patch embedding
- coordinates = None#
patch embedding coordinates
- get_coordinate(node)#
get coordinate for a single node
- Parameters:
node – Integer node index
- get_coordinates(nodes)#
get coordinates for a list of nodes
- Parameters:
nodes – Iterable of node indices
- index = None#
mapping of node index to patch coordinate index
- property shape#
shape of patch coordinates
(shape[0] is the number of nodes in the patch and shape[1] is the embedding dimension)
- l2gv2.patch.patches.create_overlapping_patches(graph: TGraph, partition_tensor: Tensor, patch_graph: TGraph, min_overlap: int, target_overlap: int)#
Create overlapping patches from a hard partition of an input graph
- Parameters:
graph – Input graph
partition_tensor – partition of input graph
patch_graph – graph where nodes are clusters of partition and an edge indicates that the corresponding patches in the output should have at least
min_overlapnodes in commonmin_overlap – minimum overlap for connected patches
target_overlap – maximum overlap during expansion for an edge (additional overlap may result from expansion of other edges)
- Returns:
list of node-index tensors for patches
- l2gv2.patch.patches.create_patch_data(graph: TGraph, partition_tensor: LongTensor, min_overlap: int, target_overlap: int, min_patch_size: int | None = None, sparsify_method: Literal['resistance', 'rmst', 'none'] = 'resistance', target_patch_degree: int = 4, gamma: int = 0, verbose: bool = False) tuple[list, object]#
Divide data into overlapping patches
- Parameters:
graph (TGraph) – input data
partition_tensor (torch.LongTensor) – starting partition for creating patches
min_overlap – minimum patch overlap for connected patches
target_overlap – maximum patch overlap during expansion of an edge of the patch graph
min_patch_size – minimum size of patches, defauls is None
sparsify_method – method for sparsifying patch graph (one of
'resistance','rmst','none'), default is'resistance'target_patch_degree – target patch degree for
sparsify_method='resistance', default is 4gamma –
gammavalue for use withsparsify_method='rmst', default is 0verbose – if true, print some info about created patches, default is False
- Returns:
list of patch data, patch graph
- l2gv2.patch.patches.create_patch_graph(graph: TGraph, max_num_patches: int, min_overlap: int, target_overlap: int, clustering_function: Callable[[TGraph, int], Tensor]) tuple[list, TGraph]#
Create a patch graph from a graph
- Parameters:
graph (TGraph) – input graph
max_num_patches (int) – maximum number of patches
min_overlap (int) – minimum overlap for connected patches
target_overlap (int) – maximum overlap during expansion
- Returns:
list of patch data, patch graph
- l2gv2.patch.patches.geodesic_expand_overlap(subgraph, seed_mask, min_overlap, target_overlap, reseed_samples=10)#
Expand patch
- Parameters:
subgraph – graph containing patch nodes and all target nodes for potential expansion
seed_mask – [description]
min_overlap – minimum overlap before stopping expansion
target_overlap – maximum overlap (if expansion step results in more overlap, the nodes added are sampled at random)
reseed_samples – [description] default is 10
- Returns:
index tensor of new nodes to add to patch
- l2gv2.patch.patches.merge_small_clusters(graph: TGraph, partition_tensor: LongTensor, min_size: int)#
Merge small clusters with adjacent clusters such that all clusters satisfy a minimum size constraint.
This function iteratively merges the smallest cluster with its neighbouring cluster with the maximum normalized cut.
- Parameters:
graph (TGraph) – Input graph
partition_tensor (torch.LongTensor) – input partition vector mapping nodes to clusters
min_size (int) – desired minimum size of clusters
- Returns:
new partition tensor where small clusters are merged.
- l2gv2.patch.patches.rolling_window_graph(n_patches: int, w)#
Generate patch edges for a rolling window
- Parameters:
n_patches (int) – Number of patches
() (w) – window width (patches connected to the w nearest neighbours on either side)
l2gv2.patch.sparsify module#
Graph sparsification
- l2gv2.patch.sparsify.effective_resistances(graph: TGraph, **args)#
compute effective resistances
- Parameters:
graph – input graph
epsilon – tolerance for effective resistance computation (default:
1e-2)
- Returns:
effective resistance for each edge
- l2gv2.patch.sparsify.hierarchical_sparsify(graph: ~l2gv2.graphs.tgraph.TGraph, clusters, target_level_degree, ensure_connected=True, sparsifier=<function edge_sampling_sparsify>)#
- l2gv2.patch.sparsify.nearest_neighbor_sparsify(graph: TGraph, target_degree, ensure_connected=True)#
- l2gv2.patch.sparsify.relaxed_spanning_tree(graph: TGraph, maximise=False, gamma=1)#
compute relaxed minimum or maximum spanning tree
This implements the relaxed minimum spanning tree algorithm of
M. Beguerisse-Díaz, B. Vangelov, and M. Barahona. “Finding role communities in directed networks using Role-Based Similarity, Markov Stability and the Relaxed Minimum Spanning Tree”. In: 2013 IEEE Global Conference on Signal and Information Processing (GlobalSIP). IEEE, 2013, pp. 937–940. isbn: 978-1-4799-0248-4.
- Parameters:
graph – input graph
maximise – if
Truestart with maximum spanning treegamma – \(\gamma\) value for adding edges
- l2gv2.patch.sparsify.resistance_sparsify(graph: TGraph, target_mean_degree, ensure_connected=True, epsilon=0.01)#
Sparsify a graph to have a target mean degree using effective resistance based sampling
- Parameters:
graph – input graph
target_mean_degree – desired mean degree after sparsification
ensure_connected – if
True, first add edges of a maximum spanning tree based on the resistance weights to ensure that the sparsified graph remains connected if the input graph is connectedepsilon – tolerance for effective resistance computation
- Returns:
sparsified graph
This algorithm is based on the method of
D. A. Spielman and N. Srivastava. “Graph sparsification by effective resistances”. SIAM Journal on Computing 40.6 (2011), pp. 1913–1926.
However, a fixed number of edges are sampled without replacement, and optionally a maximum spanning tree is kept to ensure the connectedness of the sparsified graph.
- l2gv2.patch.sparsify.resistance_weighted_graph(graph: TGraph, **args)#
modify the edge weights of a graph by multiplying by their effective resistance
- Parameters:
graph – input graph
epsilon – tolerance for effective resistance computation (default:
1e-2)
- Returns:
copy of input graph with reweighted edges
l2gv2.patch.utils module#
Implementation of local2global algorithm
- class l2gv2.patch.utils.AlignmentProblem(patches: list[Patch], patch_edges=None, min_overlap=None, copy_data: bool = True, self_loops: bool = False, verbose: bool = False)#
Bases:
objectImplements the standard local2global algorithm using an unweighted patch graph
- align_patches(scale: bool = False)#
TODO: docstring for align_patches
- Parameters:
scale – if True, rescale patches (default: False)
- calc_synchronised_rotations()#
Compute the orthogonal transformations that best align the patches
- calc_synchronised_scales(max_scale: float = 100000000.0)#
Compute the scaling transformations that best align the patches
- Parameters:
max_scale – maximum allowed scale, default is 1e8
[1/max_scale ((all scales are clipped to the range)
max_scale])
- Returns:
list of scales
- calc_synchronised_translations()#
Compute translations that best align the patches
- dim: int#
embedding dimension
- get_aligned_embedding(scale: bool = False, realign: bool = False, out=None)#
Return the aligned embedding
- Parameters:
scale – if
True, rescale patches (default:False)realign – if
True, recompute aligned embedding even if it already exists (default:False)out – numpy array to write results to, default is None
- Returns:
n_nodes x dim numpy array of embedding coordinates
- classmethod load(fname: str)#
Restore
AlignmentProblemfrom patch file- Parameters:
fname – path to patch file
- mean_embedding(out: ndarray | None = None)#
Compute node embeddings as the centroid over patch embeddings
- Parameters:
out – numpy array to write results to, default is None (supply a memmap for large-scale problems that do not fit in ram)
- median_embedding(out=None)#
TODO: docstring for median_embedding
- Parameters:
out – [description], defaults to None
- n_nodes: int#
total number of nodes
- n_patches: int#
number of patches
- rotate_patches(rotations=None)#
Align the rotation/reflection of all patches
- Parameters:
rotations – If provided, apply the given transformations instead of synchronizing patch rotations; default is None
- rotations = None#
tracks orthogonal transformations applied to patches (updated by
rotate_patches())
- save_embedding(fname: str)#
Save aligned embedding to json file
- Parameters:
fname – output filename
- save_patches(fname: str)#
Save patch embeddings to json file
- Parameters:
fname – path to output file
- scale_patches(scale_factors=None)#
Synchronise scales of the embeddings for each patch
- Parameters:
scale_factors – if provided apply the given scales instead of synchronising, default is None
- scales = None#
tracks scale transformations applied to patches (updated by
scale_patches())
- shifts = None#
tracks translation transformations applied to patches (updated by
scale_patches(),rotate_patches(), andtranslate_patches())
- translate_patches(translations=None)#
Align the patches by translation
- Parameters:
translations – If provided, apply the given translations instead of synchronizing; default is None
- verbose = False#
print debug output if True
- weight(i: int, j: int) int#
Compute the weighting factor for a pair of patches
- Parameters:
i (int) – First patch index
j (int) – Second patch index
- Returns:
1
Override this in subclasses for weighted alignment
- class l2gv2.patch.utils.SVDAlignmentProblem(*args, tol=1e-06, **kwargs)#
Bases:
WeightedAlignmentProblemTODO: docstring for SVDAlignmentProblem
- calc_synchronised_rotations()#
Compute the orthogonal transformations that best align the patches
- class l2gv2.patch.utils.WeightedAlignmentProblem(patches: list[Patch], patch_edges=None, min_overlap=None, copy_data: bool = True, self_loops: bool = False, verbose: bool = False)#
Bases:
AlignmentProblemVariant of the local2global algorithm where patch edges are weighted according to the number of nodes in the overlap.
- weight(i: int, j: int)#
Compute weight for pair of patches
- Parameters:
i – first patch index
j – second patch index
- Returns:
number of shared nodes between patches i and j
- l2gv2.patch.utils.ensure_extension(filename, extension)#
check filename for extension and add it if necessary
- Parameters:
filename – input filename
extension – desired extension (including .)
- Returns:
filename with extension added
- Raises:
ValueError – if filename has the wrong extension
- l2gv2.patch.utils.local_error(patch: Patch, reference_coordinates) ndarray#
- Compute the euclidean distance between patch coordinate
and reference coordinate for each node in patch
- Parameters:
patch
reference_coordinates
- Returns:
vector of error values
- l2gv2.patch.utils.nearest_orthogonal(mat) ndarray#
Compute nearest orthogonal matrix to a given input matrix
- Parameters:
mat – input matrix
- l2gv2.patch.utils.orthogonal_mse_error(rots1, rots2) float#
Compute the MSE between two sets of orthogonal transformations up to a global transformation
- Parameters:
rots1 – First list of orthogonal matrices
rots2 – Second list of orthogonal matrices
- l2gv2.patch.utils.procrustes_error(coordinates1: ndarray, coordinates2: ndarray) float#
Compute the procrustes alignment error between two sets of coordinates
- Parameters:
coordinates1 – First set of coordinates (array-like)
coordinates2 – Second set of coordinates (array-like)
Note that the two sets of coordinates need to have the same shape.
- l2gv2.patch.utils.random_gen(new_seed=None) Generator#
Change seed of random number generator.
- Parameters:
new_seed – New seed value, defaults to None
- l2gv2.patch.utils.relative_orthogonal_transform(coordinates1: ndarray, coordinates2: ndarray) ndarray#
Find the best orthogonal transformation aligning two sets of coordinates for the same nodes
Note that the two sets of coordinates need to have the same shape.
- Parameters:
coordinates1 – First set of coordinates (array-like)
coordinates2 – Second set of coordinates (array-like)
Returns:
- l2gv2.patch.utils.relative_scale(coordinates1: ndarray, coordinates2: ndarray, clamp=100000000.0)#
compute relative scale of two sets of coordinates for the same nodes
Note that the two sets of coordinates need to have the same shape.
- Parameters:
coordinates1 – First set of coordinates (array-like)
coordinates2 – Second set of coordinates (array-like)
clamp – maximum allowed scale, default is 1e8
- l2gv2.patch.utils.seed(new_seed)#
Change seed of random number generator.
- Parameters:
new_seed – New seed value
- l2gv2.patch.utils.transform_error(transforms) float#
Compute the recovery error based on tracked transformations.
After recovery, all transformations should be constant across patches as we can recover the embedding only up to a global scaling/rotation/translation.
The error is computed as the mean over transformation elements of the standard deviation over patches.
- Parameters:
transforms – list of transforms
Module contents#
- class l2gv2.patch.BaseLazyCoordinates#
Bases:
ABCTODO: docstring for BaseLazyCoordinates
- abstract property shape#
Shape of the coordinates
- class l2gv2.patch.FilePatch(nodes, filename, shift=None, scale=None, rot=None)#
Bases:
PatchPatch class that loads coordinates from a file with optional transformations (shift, scale, rotation).
- class l2gv2.patch.LazyFileCoordinates(filename, *args, **kwargs)#
Bases:
LazyCoordinatesTODO: doctring for LazyFileCoordinates
- property shape#
Shape of the coordinates
- class l2gv2.patch.LazyMeanAggregatorCoordinates(patches)#
Bases:
BaseLazyCoordinatesTODO: doctring for LazyMeanAggregatorCoordinates
- as_array(out=None)#
TODO: docstring for as_array
- Parameters:
out – [description], defaults to None
- get_coordinates(nodes, out=None)#
TODO: docstring for get_coordinates
- Parameters:
nodes – [description]
out – [description], default is None
- property shape#
Shape of the coordinates
- class l2gv2.patch.MeanAggregatorPatch(patches)#
Bases:
PatchPatch class that aggregates multiple patches by taking their mean coordinates.
- get_coordinate(node)#
get coordinate for a single node
- Parameters:
node – Integer node index
- get_coordinates(nodes)#
get coordinates for a list of nodes
- Parameters:
nodes – Iterable of node indices
- property patches#
- class l2gv2.patch.Patch(nodes, coordinates=None)#
Bases:
objectClass for patch embedding
- coordinates = None#
patch embedding coordinates
- get_coordinate(node)#
get coordinate for a single node
- Parameters:
node – Integer node index
- get_coordinates(nodes)#
get coordinates for a list of nodes
- Parameters:
nodes – Iterable of node indices
- index = None#
mapping of node index to patch coordinate index
- property shape#
shape of patch coordinates
(shape[0] is the number of nodes in the patch and shape[1] is the embedding dimension)
- l2gv2.patch.create_overlapping_patches(graph: TGraph, partition_tensor: Tensor, patch_graph: TGraph, min_overlap: int, target_overlap: int)#
Create overlapping patches from a hard partition of an input graph
- Parameters:
graph – Input graph
partition_tensor – partition of input graph
patch_graph – graph where nodes are clusters of partition and an edge indicates that the corresponding patches in the output should have at least
min_overlapnodes in commonmin_overlap – minimum overlap for connected patches
target_overlap – maximum overlap during expansion for an edge (additional overlap may result from expansion of other edges)
- Returns:
list of node-index tensors for patches
- l2gv2.patch.create_patch_data(graph: TGraph, partition_tensor: LongTensor, min_overlap: int, target_overlap: int, min_patch_size: int | None = None, sparsify_method: Literal['resistance', 'rmst', 'none'] = 'resistance', target_patch_degree: int = 4, gamma: int = 0, verbose: bool = False) tuple[list, object]#
Divide data into overlapping patches
- Parameters:
graph (TGraph) – input data
partition_tensor (torch.LongTensor) – starting partition for creating patches
min_overlap – minimum patch overlap for connected patches
target_overlap – maximum patch overlap during expansion of an edge of the patch graph
min_patch_size – minimum size of patches, defauls is None
sparsify_method – method for sparsifying patch graph (one of
'resistance','rmst','none'), default is'resistance'target_patch_degree – target patch degree for
sparsify_method='resistance', default is 4gamma –
gammavalue for use withsparsify_method='rmst', default is 0verbose – if true, print some info about created patches, default is False
- Returns:
list of patch data, patch graph