l2gv2.patch package

Contents

l2gv2.patch package#

Submodules#

l2gv2.patch.clustering module#

Graph clustering algorithms

This module contains functions for clustering graphs.

TODO: - Make the algorithms work with Graph objects

class l2gv2.patch.clustering.Partition(partition_tensor)#

Bases: Sequence

Defines a partition of a graph

l2gv2.patch.clustering.fennel_clustering(graph: Graph, num_clusters, load_limit=1.1, alpha=None, gamma=1.5, num_iters=1, clusters=None)#
l2gv2.patch.clustering.hierarchical_aglomerative_clustering(graph: ~raphtory.Graph, method=<function spread_clustering>, levels=None, branch_factors=None)#

Hierarchical agglomerative clustering

Implements a hierarchical agglomerative clustering algorithm.

Parameters:
  • graph – input graph

  • method – clustering method

  • levels – number of levels

  • branch_factors – branch factors

Returns:

list of clusters

l2gv2.patch.clustering.hierarchical_clustering(data: Data, m: int, k: int, clustering_function: Callable[[Data, int], Tensor]) list[Tensor]#

Perform hierarchical clustering on a PyTorch Geometric graph.

Parameters:
  • data (Data) – The input PyTorch Geometric graph.

  • m (int) – Target number of clusters.

  • k (int) – Target maximum cluster size.

  • clustering_function (Callable) – A function that takes a Data object and the number of clusters m and returns a cluster assignment tensor.

Returns:

A list of cluster assignment tensors for all levels of the hierarchy.

Return type:

List[torch.Tensor]

l2gv2.patch.clustering.louvain_clustering(graph: Graph, *args, **kwargs)#

Implements clustering using the Louvain [1] algorithm for modularity optimisation

Parameters:

graph – input graph

Returns:

partition tensor

This is a minimal wrapper around community.best_partition() from the python-louvain package. Any other arguments provided are passed through.

References

l2gv2.patch.clustering.metis_clustering(graph: TGraph, num_clusters)#

Implements clustering using metis

Parameters:
  • graph – input graph

  • num_clusters – number of cluster

Returns:

partition tensor

This uses the pymetis package

References

l2gv2.patch.clustering.spread_clustering(graph: Graph, num_clusters, max_degree_init=True)#

l2gv2.patch.lazy module#

Lazy coordinates for patch embeddings

class l2gv2.patch.lazy.BaseLazyCoordinates#

Bases: ABC

TODO: docstring for BaseLazyCoordinates

abstract property shape#

Shape of the coordinates

class l2gv2.patch.lazy.LazyCoordinates(x, shift=None, scale=None, rot=None)#

Bases: BaseLazyCoordinates

TODO: doctring for LazyCoordinates

save_transform(filename)#

Save the transformation to a file

Parameters:

filename – filename to save the transformation to

property shape#

Shape of the coordinates

class l2gv2.patch.lazy.LazyFileCoordinates(filename, *args, **kwargs)#

Bases: LazyCoordinates

TODO: doctring for LazyFileCoordinates

property shape#

Shape of the coordinates

class l2gv2.patch.lazy.LazyMeanAggregatorCoordinates(patches)#

Bases: BaseLazyCoordinates

TODO: doctring for LazyMeanAggregatorCoordinates

as_array(out=None)#

TODO: docstring for as_array

Parameters:

out – [description], defaults to None

get_coordinates(nodes, out=None)#

TODO: docstring for get_coordinates

Parameters:
  • nodes – [description]

  • out – [description], default is None

property shape#

Shape of the coordinates

l2gv2.patch.patches module#

Dividing input data into overlapping patches

This module contains functions for dividing input data into overlapping patches.

The main function is create_patch_data, which takes a graph and a partition of the graph and returns a list of patches.

class l2gv2.patch.patches.FilePatch(nodes, filename, shift=None, scale=None, rot=None)#

Bases: Patch

Patch class that loads coordinates from a file with optional transformations (shift, scale, rotation).

class l2gv2.patch.patches.MeanAggregatorPatch(patches)#

Bases: Patch

Patch class that aggregates multiple patches by taking their mean coordinates.

get_coordinate(node)#

get coordinate for a single node

Parameters:

node – Integer node index

get_coordinates(nodes)#

get coordinates for a list of nodes

Parameters:

nodes – Iterable of node indices

property patches#
class l2gv2.patch.patches.Patch(nodes, coordinates=None)#

Bases: object

Class for patch embedding

coordinates = None#

patch embedding coordinates

get_coordinate(node)#

get coordinate for a single node

Parameters:

node – Integer node index

get_coordinates(nodes)#

get coordinates for a list of nodes

Parameters:

nodes – Iterable of node indices

index = None#

mapping of node index to patch coordinate index

property shape#

shape of patch coordinates

(shape[0] is the number of nodes in the patch and shape[1] is the embedding dimension)

l2gv2.patch.patches.create_overlapping_patches(graph: TGraph, partition_tensor: Tensor, patch_graph: TGraph, min_overlap: int, target_overlap: int)#

Create overlapping patches from a hard partition of an input graph

Parameters:
  • graph – Input graph

  • partition_tensor – partition of input graph

  • patch_graph – graph where nodes are clusters of partition and an edge indicates that the corresponding patches in the output should have at least min_overlap nodes in common

  • min_overlap – minimum overlap for connected patches

  • target_overlap – maximum overlap during expansion for an edge (additional overlap may result from expansion of other edges)

Returns:

list of node-index tensors for patches

l2gv2.patch.patches.create_patch_data(graph: TGraph, partition_tensor: LongTensor, min_overlap: int, target_overlap: int, min_patch_size: int | None = None, sparsify_method: Literal['resistance', 'rmst', 'none'] = 'resistance', target_patch_degree: int = 4, gamma: int = 0, verbose: bool = False) tuple[list, object]#

Divide data into overlapping patches

Parameters:
  • graph (TGraph) – input data

  • partition_tensor (torch.LongTensor) – starting partition for creating patches

  • min_overlap – minimum patch overlap for connected patches

  • target_overlap – maximum patch overlap during expansion of an edge of the patch graph

  • min_patch_size – minimum size of patches, defauls is None

  • sparsify_method – method for sparsifying patch graph (one of 'resistance', 'rmst', 'none'), default is 'resistance'

  • target_patch_degree – target patch degree for sparsify_method='resistance', default is 4

  • gammagamma value for use with sparsify_method='rmst', default is 0

  • verbose – if true, print some info about created patches, default is False

Returns:

list of patch data, patch graph

l2gv2.patch.patches.create_patch_graph(graph: TGraph, max_num_patches: int, min_overlap: int, target_overlap: int, clustering_function: Callable[[TGraph, int], Tensor]) tuple[list, TGraph]#

Create a patch graph from a graph

Parameters:
  • graph (TGraph) – input graph

  • max_num_patches (int) – maximum number of patches

  • min_overlap (int) – minimum overlap for connected patches

  • target_overlap (int) – maximum overlap during expansion

Returns:

list of patch data, patch graph

l2gv2.patch.patches.geodesic_expand_overlap(subgraph, seed_mask, min_overlap, target_overlap, reseed_samples=10)#

Expand patch

Parameters:
  • subgraph – graph containing patch nodes and all target nodes for potential expansion

  • seed_mask – [description]

  • min_overlap – minimum overlap before stopping expansion

  • target_overlap – maximum overlap (if expansion step results in more overlap, the nodes added are sampled at random)

  • reseed_samples – [description] default is 10

Returns:

index tensor of new nodes to add to patch

l2gv2.patch.patches.merge_small_clusters(graph: TGraph, partition_tensor: LongTensor, min_size: int)#

Merge small clusters with adjacent clusters such that all clusters satisfy a minimum size constraint.

This function iteratively merges the smallest cluster with its neighbouring cluster with the maximum normalized cut.

Parameters:
  • graph (TGraph) – Input graph

  • partition_tensor (torch.LongTensor) – input partition vector mapping nodes to clusters

  • min_size (int) – desired minimum size of clusters

Returns:

new partition tensor where small clusters are merged.

l2gv2.patch.patches.rolling_window_graph(n_patches: int, w)#

Generate patch edges for a rolling window

Parameters:
  • n_patches (int) – Number of patches

  • () (w) – window width (patches connected to the w nearest neighbours on either side)

l2gv2.patch.sparsify module#

Graph sparsification

l2gv2.patch.sparsify.conductance_weighted_graph(graph: TGraph)#
l2gv2.patch.sparsify.edge_sampling_sparsify(graph: TGraph, target_degree, ensure_connected=True)#
l2gv2.patch.sparsify.effective_resistances(graph: TGraph, **args)#

compute effective resistances

Parameters:
  • graph – input graph

  • epsilon – tolerance for effective resistance computation (default: 1e-2)

Returns:

effective resistance for each edge

l2gv2.patch.sparsify.hierarchical_sparsify(graph: ~l2gv2.graphs.tgraph.TGraph, clusters, target_level_degree, ensure_connected=True, sparsifier=<function edge_sampling_sparsify>)#
l2gv2.patch.sparsify.nearest_neighbor_sparsify(graph: TGraph, target_degree, ensure_connected=True)#
l2gv2.patch.sparsify.relaxed_spanning_tree(graph: TGraph, maximise=False, gamma=1)#

compute relaxed minimum or maximum spanning tree

This implements the relaxed minimum spanning tree algorithm of

M. Beguerisse-Díaz, B. Vangelov, and M. Barahona. “Finding role communities in directed networks using Role-Based Similarity, Markov Stability and the Relaxed Minimum Spanning Tree”. In: 2013 IEEE Global Conference on Signal and Information Processing (GlobalSIP). IEEE, 2013, pp. 937–940. isbn: 978-1-4799-0248-4.

Parameters:
  • graph – input graph

  • maximise – if True start with maximum spanning tree

  • gamma\(\gamma\) value for adding edges

l2gv2.patch.sparsify.resistance_sparsify(graph: TGraph, target_mean_degree, ensure_connected=True, epsilon=0.01)#

Sparsify a graph to have a target mean degree using effective resistance based sampling

Parameters:
  • graph – input graph

  • target_mean_degree – desired mean degree after sparsification

  • ensure_connected – if True, first add edges of a maximum spanning tree based on the resistance weights to ensure that the sparsified graph remains connected if the input graph is connected

  • epsilon – tolerance for effective resistance computation

Returns:

sparsified graph

This algorithm is based on the method of

D. A. Spielman and N. Srivastava. “Graph sparsification by effective resistances”. SIAM Journal on Computing 40.6 (2011), pp. 1913–1926.

However, a fixed number of edges are sampled without replacement, and optionally a maximum spanning tree is kept to ensure the connectedness of the sparsified graph.

l2gv2.patch.sparsify.resistance_weighted_graph(graph: TGraph, **args)#

modify the edge weights of a graph by multiplying by their effective resistance

Parameters:
  • graph – input graph

  • epsilon – tolerance for effective resistance computation (default: 1e-2)

Returns:

copy of input graph with reweighted edges

l2gv2.patch.utils module#

Implementation of local2global algorithm

class l2gv2.patch.utils.AlignmentProblem(patches: list[Patch], patch_edges=None, min_overlap=None, copy_data: bool = True, self_loops: bool = False, verbose: bool = False)#

Bases: object

Implements the standard local2global algorithm using an unweighted patch graph

align_patches(scale: bool = False)#

TODO: docstring for align_patches

Parameters:

scale – if True, rescale patches (default: False)

calc_synchronised_rotations()#

Compute the orthogonal transformations that best align the patches

calc_synchronised_scales(max_scale: float = 100000000.0)#

Compute the scaling transformations that best align the patches

Parameters:
  • max_scale – maximum allowed scale, default is 1e8

  • [1/max_scale ((all scales are clipped to the range)

  • max_scale])

Returns:

list of scales

calc_synchronised_translations()#

Compute translations that best align the patches

dim: int#

embedding dimension

get_aligned_embedding(scale: bool = False, realign: bool = False, out=None)#

Return the aligned embedding

Parameters:
  • scale – if True, rescale patches (default: False)

  • realign – if True, recompute aligned embedding even if it already exists (default: False)

  • out – numpy array to write results to, default is None

Returns:

n_nodes x dim numpy array of embedding coordinates

classmethod load(fname: str)#

Restore AlignmentProblem from patch file

Parameters:

fname – path to patch file

mean_embedding(out: ndarray | None = None)#

Compute node embeddings as the centroid over patch embeddings

Parameters:

out – numpy array to write results to, default is None (supply a memmap for large-scale problems that do not fit in ram)

median_embedding(out=None)#

TODO: docstring for median_embedding

Parameters:

out – [description], defaults to None

n_nodes: int#

total number of nodes

n_patches: int#

number of patches

rotate_patches(rotations=None)#

Align the rotation/reflection of all patches

Parameters:

rotations – If provided, apply the given transformations instead of synchronizing patch rotations; default is None

rotations = None#

tracks orthogonal transformations applied to patches (updated by rotate_patches())

save_embedding(fname: str)#

Save aligned embedding to json file

Parameters:

fname – output filename

save_patches(fname: str)#

Save patch embeddings to json file

Parameters:

fname – path to output file

scale_patches(scale_factors=None)#

Synchronise scales of the embeddings for each patch

Parameters:

scale_factors – if provided apply the given scales instead of synchronising, default is None

scales = None#

tracks scale transformations applied to patches (updated by scale_patches())

shifts = None#

tracks translation transformations applied to patches (updated by scale_patches(), rotate_patches(), and translate_patches())

translate_patches(translations=None)#

Align the patches by translation

Parameters:

translations – If provided, apply the given translations instead of synchronizing; default is None

verbose = False#

print debug output if True

weight(i: int, j: int) int#

Compute the weighting factor for a pair of patches

Parameters:
  • i (int) – First patch index

  • j (int) – Second patch index

Returns:

1

Override this in subclasses for weighted alignment

class l2gv2.patch.utils.SVDAlignmentProblem(*args, tol=1e-06, **kwargs)#

Bases: WeightedAlignmentProblem

TODO: docstring for SVDAlignmentProblem

calc_synchronised_rotations()#

Compute the orthogonal transformations that best align the patches

class l2gv2.patch.utils.WeightedAlignmentProblem(patches: list[Patch], patch_edges=None, min_overlap=None, copy_data: bool = True, self_loops: bool = False, verbose: bool = False)#

Bases: AlignmentProblem

Variant of the local2global algorithm where patch edges are weighted according to the number of nodes in the overlap.

weight(i: int, j: int)#

Compute weight for pair of patches

Parameters:
  • i – first patch index

  • j – second patch index

Returns:

number of shared nodes between patches i and j

l2gv2.patch.utils.ensure_extension(filename, extension)#

check filename for extension and add it if necessary

Parameters:
  • filename – input filename

  • extension – desired extension (including .)

Returns:

filename with extension added

Raises:

ValueError – if filename has the wrong extension

l2gv2.patch.utils.local_error(patch: Patch, reference_coordinates) ndarray#
Compute the euclidean distance between patch coordinate

and reference coordinate for each node in patch

Parameters:
  • patch

  • reference_coordinates

Returns:

vector of error values

l2gv2.patch.utils.nearest_orthogonal(mat) ndarray#

Compute nearest orthogonal matrix to a given input matrix

Parameters:

mat – input matrix

l2gv2.patch.utils.orthogonal_mse_error(rots1, rots2) float#

Compute the MSE between two sets of orthogonal transformations up to a global transformation

Parameters:
  • rots1 – First list of orthogonal matrices

  • rots2 – Second list of orthogonal matrices

l2gv2.patch.utils.procrustes_error(coordinates1: ndarray, coordinates2: ndarray) float#

Compute the procrustes alignment error between two sets of coordinates

Parameters:
  • coordinates1 – First set of coordinates (array-like)

  • coordinates2 – Second set of coordinates (array-like)

Note that the two sets of coordinates need to have the same shape.

l2gv2.patch.utils.random_gen(new_seed=None) Generator#

Change seed of random number generator.

Parameters:

new_seed – New seed value, defaults to None

l2gv2.patch.utils.relative_orthogonal_transform(coordinates1: ndarray, coordinates2: ndarray) ndarray#

Find the best orthogonal transformation aligning two sets of coordinates for the same nodes

Note that the two sets of coordinates need to have the same shape.

Parameters:
  • coordinates1 – First set of coordinates (array-like)

  • coordinates2 – Second set of coordinates (array-like)

Returns:

l2gv2.patch.utils.relative_scale(coordinates1: ndarray, coordinates2: ndarray, clamp=100000000.0)#

compute relative scale of two sets of coordinates for the same nodes

Note that the two sets of coordinates need to have the same shape.

Parameters:
  • coordinates1 – First set of coordinates (array-like)

  • coordinates2 – Second set of coordinates (array-like)

  • clamp – maximum allowed scale, default is 1e8

l2gv2.patch.utils.seed(new_seed)#

Change seed of random number generator.

Parameters:

new_seed – New seed value

l2gv2.patch.utils.transform_error(transforms) float#

Compute the recovery error based on tracked transformations.

After recovery, all transformations should be constant across patches as we can recover the embedding only up to a global scaling/rotation/translation.

The error is computed as the mean over transformation elements of the standard deviation over patches.

Parameters:

transforms – list of transforms

Module contents#

class l2gv2.patch.BaseLazyCoordinates#

Bases: ABC

TODO: docstring for BaseLazyCoordinates

abstract property shape#

Shape of the coordinates

class l2gv2.patch.FilePatch(nodes, filename, shift=None, scale=None, rot=None)#

Bases: Patch

Patch class that loads coordinates from a file with optional transformations (shift, scale, rotation).

class l2gv2.patch.LazyFileCoordinates(filename, *args, **kwargs)#

Bases: LazyCoordinates

TODO: doctring for LazyFileCoordinates

property shape#

Shape of the coordinates

class l2gv2.patch.LazyMeanAggregatorCoordinates(patches)#

Bases: BaseLazyCoordinates

TODO: doctring for LazyMeanAggregatorCoordinates

as_array(out=None)#

TODO: docstring for as_array

Parameters:

out – [description], defaults to None

get_coordinates(nodes, out=None)#

TODO: docstring for get_coordinates

Parameters:
  • nodes – [description]

  • out – [description], default is None

property shape#

Shape of the coordinates

class l2gv2.patch.MeanAggregatorPatch(patches)#

Bases: Patch

Patch class that aggregates multiple patches by taking their mean coordinates.

get_coordinate(node)#

get coordinate for a single node

Parameters:

node – Integer node index

get_coordinates(nodes)#

get coordinates for a list of nodes

Parameters:

nodes – Iterable of node indices

property patches#
class l2gv2.patch.Patch(nodes, coordinates=None)#

Bases: object

Class for patch embedding

coordinates = None#

patch embedding coordinates

get_coordinate(node)#

get coordinate for a single node

Parameters:

node – Integer node index

get_coordinates(nodes)#

get coordinates for a list of nodes

Parameters:

nodes – Iterable of node indices

index = None#

mapping of node index to patch coordinate index

property shape#

shape of patch coordinates

(shape[0] is the number of nodes in the patch and shape[1] is the embedding dimension)

l2gv2.patch.create_overlapping_patches(graph: TGraph, partition_tensor: Tensor, patch_graph: TGraph, min_overlap: int, target_overlap: int)#

Create overlapping patches from a hard partition of an input graph

Parameters:
  • graph – Input graph

  • partition_tensor – partition of input graph

  • patch_graph – graph where nodes are clusters of partition and an edge indicates that the corresponding patches in the output should have at least min_overlap nodes in common

  • min_overlap – minimum overlap for connected patches

  • target_overlap – maximum overlap during expansion for an edge (additional overlap may result from expansion of other edges)

Returns:

list of node-index tensors for patches

l2gv2.patch.create_patch_data(graph: TGraph, partition_tensor: LongTensor, min_overlap: int, target_overlap: int, min_patch_size: int | None = None, sparsify_method: Literal['resistance', 'rmst', 'none'] = 'resistance', target_patch_degree: int = 4, gamma: int = 0, verbose: bool = False) tuple[list, object]#

Divide data into overlapping patches

Parameters:
  • graph (TGraph) – input data

  • partition_tensor (torch.LongTensor) – starting partition for creating patches

  • min_overlap – minimum patch overlap for connected patches

  • target_overlap – maximum patch overlap during expansion of an edge of the patch graph

  • min_patch_size – minimum size of patches, defauls is None

  • sparsify_method – method for sparsifying patch graph (one of 'resistance', 'rmst', 'none'), default is 'resistance'

  • target_patch_degree – target patch degree for sparsify_method='resistance', default is 4

  • gammagamma value for use with sparsify_method='rmst', default is 0

  • verbose – if true, print some info about created patches, default is False

Returns:

list of patch data, patch graph