l2gv2.patch package

l2gv2.patch package#

Submodules#

l2gv2.patch.clustering module#

Graph clustering algorithms

This module contains functions for clustering graphs.

TODO: - Make the algorithms work with Graph objects

class l2gv2.patch.clustering.Partition(partition_tensor)#

Bases: Sequence

Defines a partition of a graph

l2gv2.patch.clustering.fennel_clustering(graph: Graph, num_clusters, load_limit=1.1, alpha=None, gamma=1.5, num_iters=1, clusters=None)#

l2gv2.patch.clustering.hierarchical_aglomerative_clustering(graph: ~raphtory.Graph, method=<function spread_clustering>, levels=None, branch_factors=None)#

Hierarchical agglomerative clustering

Implements a hierarchical agglomerative clustering algorithm.

Parameters:

graph – input graph
method – clustering method
levels – number of levels
branch_factors – branch factors

Returns:

list of clusters

l2gv2.patch.clustering.hierarchical_clustering(data: Data, m: int, k: int, clustering_function: Callable[[Data, int], Tensor]) → list[Tensor]#

Perform hierarchical clustering on a PyTorch Geometric graph.

Parameters:

data (Data) – The input PyTorch Geometric graph.
m (int) – Target number of clusters.
k (int) – Target maximum cluster size.
clustering_function (Callable) – A function that takes a Data object and the number of clusters m and returns a cluster assignment tensor.

Returns:

A list of cluster assignment tensors for all levels of the hierarchy.

Return type:

List[torch.Tensor]

l2gv2.patch.clustering.louvain_clustering(graph: Graph, *args, **kwargs)#

Implements clustering using the Louvain [1] algorithm for modularity optimisation

Parameters:: graph – input graph
Returns:: partition tensor

This is a minimal wrapper around community.best_partition() from the python-louvain package. Any other arguments provided are passed through.

References

l2gv2.patch.clustering.metis_clustering(graph: TGraph, num_clusters)#

Implements clustering using metis

Parameters:

graph – input graph
num_clusters – number of cluster

Returns:

partition tensor

This uses the pymetis package

References

l2gv2.patch.clustering.spread_clustering(graph: Graph, num_clusters, max_degree_init=True)#

l2gv2.patch.lazy module#

Lazy coordinates for patch embeddings

class l2gv2.patch.lazy.BaseLazyCoordinates#

Bases: ABC

TODO: docstring for BaseLazyCoordinates

abstract property shape#: Shape of the coordinates

class l2gv2.patch.lazy.LazyCoordinates(x, shift=None, scale=None, rot=None)#

Bases: BaseLazyCoordinates

TODO: doctring for LazyCoordinates

save_transform(filename)#

Save the transformation to a file

Parameters:: filename – filename to save the transformation to

property shape#: Shape of the coordinates

class l2gv2.patch.lazy.LazyFileCoordinates(filename, *args, **kwargs)#

Bases: LazyCoordinates

TODO: doctring for LazyFileCoordinates

property shape#: Shape of the coordinates

class l2gv2.patch.lazy.LazyMeanAggregatorCoordinates(patches)#

Bases: BaseLazyCoordinates

TODO: doctring for LazyMeanAggregatorCoordinates

as_array(out=None)#

TODO: docstring for as_array

Parameters:: out – [description], defaults to None

get_coordinates(nodes, out=None)#

TODO: docstring for get_coordinates

Parameters:

nodes – [description]
out – [description], default is None

property shape#: Shape of the coordinates

l2gv2.patch.patches module#

Dividing input data into overlapping patches

This module contains functions for dividing input data into overlapping patches.

The main function is create_patch_data, which takes a graph and a partition of the graph and returns a list of patches.

class l2gv2.patch.patches.FilePatch(nodes, filename, shift=None, scale=None, rot=None)#

Bases: Patch

Patch class that loads coordinates from a file with optional transformations (shift, scale, rotation).

class l2gv2.patch.patches.MeanAggregatorPatch(patches)#

Bases: Patch

Patch class that aggregates multiple patches by taking their mean coordinates.

get_coordinate(node)#

get coordinate for a single node

Parameters:: node – Integer node index

get_coordinates(nodes)#

get coordinates for a list of nodes

Parameters:: nodes – Iterable of node indices

property patches#

class l2gv2.patch.patches.Patch(nodes, coordinates=None)#

Bases: object

Class for patch embedding

coordinates = None#: patch embedding coordinates

get_coordinate(node)#

get coordinate for a single node

Parameters:: node – Integer node index

get_coordinates(nodes)#

get coordinates for a list of nodes

Parameters:: nodes – Iterable of node indices

index = None#: mapping of node index to patch coordinate index

property shape#

shape of patch coordinates

(shape[0] is the number of nodes in the patch and shape[1] is the embedding dimension)

l2gv2.patch.patches.create_overlapping_patches(graph: TGraph, partition_tensor: Tensor, patch_graph: TGraph, min_overlap: int, target_overlap: int)#

Create overlapping patches from a hard partition of an input graph

Parameters:

graph – Input graph
partition_tensor – partition of input graph
patch_graph – graph where nodes are clusters of partition and an edge indicates that the corresponding patches in the output should have at least min_overlap nodes in common
min_overlap – minimum overlap for connected patches
target_overlap – maximum overlap during expansion for an edge (additional overlap may result from expansion of other edges)

Returns:

list of node-index tensors for patches

l2gv2.patch.patches.create_patch_data(graph: TGraph, partition_tensor: LongTensor, min_overlap: int, target_overlap: int, min_patch_size: int | None = None, sparsify_method: Literal['resistance', 'rmst', 'none'] = 'resistance', target_patch_degree: int = 4, gamma: int = 0, verbose: bool = False) → tuple[list, object]#

Divide data into overlapping patches

Parameters:

graph (TGraph) – input data
partition_tensor (torch.LongTensor) – starting partition for creating patches
min_overlap – minimum patch overlap for connected patches
target_overlap – maximum patch overlap during expansion of an edge of the patch graph
min_patch_size – minimum size of patches, defauls is None
sparsify_method – method for sparsifying patch graph (one of 'resistance', 'rmst', 'none'), default is 'resistance'
target_patch_degree – target patch degree for sparsify_method='resistance', default is 4
gamma – gamma value for use with sparsify_method='rmst', default is 0
verbose – if true, print some info about created patches, default is False

Returns:

list of patch data, patch graph

l2gv2.patch.patches.create_patch_graph(graph: TGraph, max_num_patches: int, min_overlap: int, target_overlap: int, clustering_function: Callable[[TGraph, int], Tensor]) → tuple[list, TGraph]#

Create a patch graph from a graph

Parameters:

graph (TGraph) – input graph
max_num_patches (int) – maximum number of patches
min_overlap (int) – minimum overlap for connected patches
target_overlap (int) – maximum overlap during expansion

Returns:

list of patch data, patch graph

l2gv2.patch.patches.geodesic_expand_overlap(subgraph, seed_mask, min_overlap, target_overlap, reseed_samples=10)#

Expand patch

Parameters:

subgraph – graph containing patch nodes and all target nodes for potential expansion
seed_mask – [description]
min_overlap – minimum overlap before stopping expansion
target_overlap – maximum overlap (if expansion step results in more overlap, the nodes added are sampled at random)
reseed_samples – [description] default is 10

Returns:

index tensor of new nodes to add to patch

l2gv2.patch.patches.merge_small_clusters(graph: TGraph, partition_tensor: LongTensor, min_size: int)#

Merge small clusters with adjacent clusters such that all clusters satisfy a minimum size constraint.

This function iteratively merges the smallest cluster with its neighbouring cluster with the maximum normalized cut.

Parameters:

graph (TGraph) – Input graph
partition_tensor (torch.LongTensor) – input partition vector mapping nodes to clusters
min_size (int) – desired minimum size of clusters

Returns:

new partition tensor where small clusters are merged.

l2gv2.patch.patches.rolling_window_graph(n_patches: int, w)#

Generate patch edges for a rolling window

Parameters:

n_patches (int) – Number of patches
() (w) – window width (patches connected to the w nearest neighbours on either side)

l2gv2.patch.sparsify module#

Graph sparsification

l2gv2.patch.sparsify.conductance_weighted_graph(graph: TGraph)#

l2gv2.patch.sparsify.edge_sampling_sparsify(graph: TGraph, target_degree, ensure_connected=True)#

l2gv2.patch.sparsify.effective_resistances(graph: TGraph, **args)#

compute effective resistances

Parameters:

graph – input graph
epsilon – tolerance for effective resistance computation (default: 1e-2)

Returns:

effective resistance for each edge

l2gv2.patch.sparsify.hierarchical_sparsify(graph: ~l2gv2.graphs.tgraph.TGraph, clusters, target_level_degree, ensure_connected=True, sparsifier=<function edge_sampling_sparsify>)#

l2gv2.patch.sparsify.nearest_neighbor_sparsify(graph: TGraph, target_degree, ensure_connected=True)#

l2gv2.patch.sparsify.relaxed_spanning_tree(graph: TGraph, maximise=False, gamma=1)#

compute relaxed minimum or maximum spanning tree

This implements the relaxed minimum spanning tree algorithm of

M. Beguerisse-Díaz, B. Vangelov, and M. Barahona. “Finding role communities in directed networks using Role-Based Similarity, Markov Stability and the Relaxed Minimum Spanning Tree”. In: 2013 IEEE Global Conference on Signal and Information Processing (GlobalSIP). IEEE, 2013, pp. 937–940. isbn: 978-1-4799-0248-4.

Parameters:

graph – input graph
maximise – if True start with maximum spanning tree
gamma – \(\gamma\) value for adding edges

l2gv2.patch.sparsify.resistance_sparsify(graph: TGraph, target_mean_degree, ensure_connected=True, epsilon=0.01)#

Sparsify a graph to have a target mean degree using effective resistance based sampling

Parameters:

graph – input graph
target_mean_degree – desired mean degree after sparsification
ensure_connected – if True, first add edges of a maximum spanning tree based on the resistance weights to ensure that the sparsified graph remains connected if the input graph is connected
epsilon – tolerance for effective resistance computation

Returns:

sparsified graph

This algorithm is based on the method of

D. A. Spielman and N. Srivastava. “Graph sparsification by effective resistances”. SIAM Journal on Computing 40.6 (2011), pp. 1913–1926.

However, a fixed number of edges are sampled without replacement, and optionally a maximum spanning tree is kept to ensure the connectedness of the sparsified graph.

l2gv2.patch.sparsify.resistance_weighted_graph(graph: TGraph, **args)#

modify the edge weights of a graph by multiplying by their effective resistance

Parameters:

graph – input graph
epsilon – tolerance for effective resistance computation (default: 1e-2)

Returns:

copy of input graph with reweighted edges

l2gv2.patch.utils module#

Implementation of local2global algorithm

class l2gv2.patch.utils.AlignmentProblem(patches: list[Patch], patch_edges=None, min_overlap=None, copy_data: bool = True, self_loops: bool = False, verbose: bool = False)#

Bases: object

Implements the standard local2global algorithm using an unweighted patch graph

align_patches(scale: bool = False)#

TODO: docstring for align_patches

Parameters:: scale – if True, rescale patches (default: False)

calc_synchronised_rotations()#: Compute the orthogonal transformations that best align the patches

calc_synchronised_scales(max_scale: float = 100000000.0)#

Compute the scaling transformations that best align the patches

Parameters:

max_scale – maximum allowed scale, default is 1e8
[1/max_scale ((all scales are clipped to the range)
max_scale])

Returns:

list of scales

calc_synchronised_translations()#: Compute translations that best align the patches

dim: int#: embedding dimension

get_aligned_embedding(scale: bool = False, realign: bool = False, out=None)#

Return the aligned embedding

Parameters:

scale – if True, rescale patches (default: False)
realign – if True, recompute aligned embedding even if it already exists (default: False)
out – numpy array to write results to, default is None

Returns:

n_nodes x dim numpy array of embedding coordinates

classmethod load(fname: str)#

Restore AlignmentProblem from patch file

Parameters:: fname – path to patch file

mean_embedding(out: ndarray | None = None)#

Compute node embeddings as the centroid over patch embeddings

Parameters:: out – numpy array to write results to, default is None (supply a memmap for large-scale problems that do not fit in ram)

median_embedding(out=None)#

TODO: docstring for median_embedding

Parameters:: out – [description], defaults to None

n_nodes: int#: total number of nodes

n_patches: int#: number of patches

rotate_patches(rotations=None)#

Align the rotation/reflection of all patches

Parameters:: rotations – If provided, apply the given transformations instead of synchronizing patch rotations; default is None

rotations = None#: tracks orthogonal transformations applied to patches (updated by rotate_patches())

save_embedding(fname: str)#

Save aligned embedding to json file

Parameters:: fname – output filename

save_patches(fname: str)#

Save patch embeddings to json file

Parameters:: fname – path to output file

scale_patches(scale_factors=None)#

Synchronise scales of the embeddings for each patch

Parameters:: scale_factors – if provided apply the given scales instead of synchronising, default is None

scales = None#: tracks scale transformations applied to patches (updated by scale_patches())

shifts = None#: tracks translation transformations applied to patches (updated by scale_patches(), rotate_patches(), and translate_patches())

translate_patches(translations=None)#

Align the patches by translation

Parameters:: translations – If provided, apply the given translations instead of synchronizing; default is None

verbose = False#: print debug output if True

weight(i: int, j: int) → int#

Compute the weighting factor for a pair of patches

Parameters:

i (int) – First patch index
j (int) – Second patch index

Returns:

1

Override this in subclasses for weighted alignment

class l2gv2.patch.utils.SVDAlignmentProblem(*args, tol=1e-06, **kwargs)#

Bases: WeightedAlignmentProblem

TODO: docstring for SVDAlignmentProblem

calc_synchronised_rotations()#: Compute the orthogonal transformations that best align the patches

class l2gv2.patch.utils.WeightedAlignmentProblem(patches: list[Patch], patch_edges=None, min_overlap=None, copy_data: bool = True, self_loops: bool = False, verbose: bool = False)#

Bases: AlignmentProblem

Variant of the local2global algorithm where patch edges are weighted according to the number of nodes in the overlap.

weight(i: int, j: int)#

Compute weight for pair of patches

Parameters:

i – first patch index
j – second patch index

Returns:

number of shared nodes between patches i and j

l2gv2.patch.utils.ensure_extension(filename, extension)#

check filename for extension and add it if necessary

Parameters:

filename – input filename
extension – desired extension (including .)

Returns:

filename with extension added

Raises:

ValueError – if filename has the wrong extension

l2gv2.patch.utils.local_error(patch: Patch, reference_coordinates) → ndarray#

Compute the euclidean distance between patch coordinate: and reference coordinate for each node in patch

Parameters:

patch
reference_coordinates

Returns:

vector of error values

l2gv2.patch.utils.nearest_orthogonal(mat) → ndarray#

Compute nearest orthogonal matrix to a given input matrix

Parameters:: mat – input matrix

l2gv2.patch.utils.orthogonal_mse_error(rots1, rots2) → float#

Compute the MSE between two sets of orthogonal transformations up to a global transformation

Parameters:

rots1 – First list of orthogonal matrices
rots2 – Second list of orthogonal matrices

l2gv2.patch.utils.procrustes_error(coordinates1: ndarray, coordinates2: ndarray) → float#

Compute the procrustes alignment error between two sets of coordinates

Parameters:

coordinates1 – First set of coordinates (array-like)
coordinates2 – Second set of coordinates (array-like)

Note that the two sets of coordinates need to have the same shape.

l2gv2.patch.utils.random_gen(new_seed=None) → Generator#

Change seed of random number generator.

Parameters:: new_seed – New seed value, defaults to None

l2gv2.patch.utils.relative_orthogonal_transform(coordinates1: ndarray, coordinates2: ndarray) → ndarray#

Find the best orthogonal transformation aligning two sets of coordinates for the same nodes

Note that the two sets of coordinates need to have the same shape.

Parameters:

coordinates1 – First set of coordinates (array-like)
coordinates2 – Second set of coordinates (array-like)

Returns:

l2gv2.patch.utils.relative_scale(coordinates1: ndarray, coordinates2: ndarray, clamp=100000000.0)#

compute relative scale of two sets of coordinates for the same nodes

Note that the two sets of coordinates need to have the same shape.

Parameters:

coordinates1 – First set of coordinates (array-like)
coordinates2 – Second set of coordinates (array-like)
clamp – maximum allowed scale, default is 1e8

l2gv2.patch.utils.seed(new_seed)#

Change seed of random number generator.

Parameters:: new_seed – New seed value

l2gv2.patch.utils.transform_error(transforms) → float#

Compute the recovery error based on tracked transformations.

After recovery, all transformations should be constant across patches as we can recover the embedding only up to a global scaling/rotation/translation.

The error is computed as the mean over transformation elements of the standard deviation over patches.

Parameters:: transforms – list of transforms

Module contents#

class l2gv2.patch.BaseLazyCoordinates#

Bases: ABC

TODO: docstring for BaseLazyCoordinates

abstract property shape#: Shape of the coordinates

class l2gv2.patch.FilePatch(nodes, filename, shift=None, scale=None, rot=None)#

Bases: Patch

Patch class that loads coordinates from a file with optional transformations (shift, scale, rotation).

class l2gv2.patch.LazyFileCoordinates(filename, *args, **kwargs)#

Bases: LazyCoordinates

TODO: doctring for LazyFileCoordinates

property shape#: Shape of the coordinates

class l2gv2.patch.LazyMeanAggregatorCoordinates(patches)#

Bases: BaseLazyCoordinates

TODO: doctring for LazyMeanAggregatorCoordinates

as_array(out=None)#

TODO: docstring for as_array

Parameters:: out – [description], defaults to None

get_coordinates(nodes, out=None)#

TODO: docstring for get_coordinates

Parameters:

nodes – [description]
out – [description], default is None

property shape#: Shape of the coordinates

class l2gv2.patch.MeanAggregatorPatch(patches)#

Bases: Patch

Patch class that aggregates multiple patches by taking their mean coordinates.

get_coordinate(node)#

get coordinate for a single node

Parameters:: node – Integer node index

get_coordinates(nodes)#

get coordinates for a list of nodes

Parameters:: nodes – Iterable of node indices

property patches#

class l2gv2.patch.Patch(nodes, coordinates=None)#

Bases: object

Class for patch embedding

coordinates = None#: patch embedding coordinates

get_coordinate(node)#

get coordinate for a single node

Parameters:: node – Integer node index

get_coordinates(nodes)#

get coordinates for a list of nodes

Parameters:: nodes – Iterable of node indices

index = None#: mapping of node index to patch coordinate index

property shape#

shape of patch coordinates

(shape[0] is the number of nodes in the patch and shape[1] is the embedding dimension)

l2gv2.patch.create_overlapping_patches(graph: TGraph, partition_tensor: Tensor, patch_graph: TGraph, min_overlap: int, target_overlap: int)#

Create overlapping patches from a hard partition of an input graph

Parameters:

graph – Input graph
partition_tensor – partition of input graph
patch_graph – graph where nodes are clusters of partition and an edge indicates that the corresponding patches in the output should have at least min_overlap nodes in common
min_overlap – minimum overlap for connected patches
target_overlap – maximum overlap during expansion for an edge (additional overlap may result from expansion of other edges)

Returns:

list of node-index tensors for patches

l2gv2.patch.create_patch_data(graph: TGraph, partition_tensor: LongTensor, min_overlap: int, target_overlap: int, min_patch_size: int | None = None, sparsify_method: Literal['resistance', 'rmst', 'none'] = 'resistance', target_patch_degree: int = 4, gamma: int = 0, verbose: bool = False) → tuple[list, object]#

Divide data into overlapping patches

Parameters:

graph (TGraph) – input data
partition_tensor (torch.LongTensor) – starting partition for creating patches
min_overlap – minimum patch overlap for connected patches
target_overlap – maximum patch overlap during expansion of an edge of the patch graph
min_patch_size – minimum size of patches, defauls is None
sparsify_method – method for sparsifying patch graph (one of 'resistance', 'rmst', 'none'), default is 'resistance'
target_patch_degree – target patch degree for sparsify_method='resistance', default is 4
gamma – gamma value for use with sparsify_method='rmst', default is 0
verbose – if true, print some info about created patches, default is False

Returns:

list of patch data, patch graph

l2gv2.patch package

Contents

l2gv2.patch package#

Submodules#

l2gv2.patch.clustering module#

l2gv2.patch.lazy module#

l2gv2.patch.patches module#

l2gv2.patch.sparsify module#

l2gv2.patch.utils module#

Module contents#