l2gv2.datasets package#

Submodules#

l2gv2.datasets.as733 module#

AS-733 dataset from SNAP.

class l2gv2.datasets.as733.AS733Dataset(root: str | None = None, transform: Callable | None = None, pre_transform: Callable | None = None)#

Bases: BaseDataset

A PyTorch Dataset for the AS-733 dataset from SNAP.

download()#

Download the dataset tarball and extract it into the raw_dir.

process()#

Process the raw text files into a combined Polars DataFrame and save it as a Parquet file.

property processed_file_names: list[str]#

The processed file names for the AS-733 dataset.

property raw_file_names: list[str]#

The raw file names for the AS-733 dataset.

url = 'https://snap.stanford.edu/data/as-733.tar.gz'#

l2gv2.datasets.base module#

Utilities for loading graph datasets.

The module provides a DataLoader class that can load graph datasets torch-geometric.data.Dataset and return a polars DataFrame of the edges and nodes. It contains methods to convert the graph into a raphtory graph and a networkx graph.

class l2gv2.datasets.base.BaseDataset(root: str | None = None, transform: Callable | None = None, pre_transform: Callable | None = None)#

Bases: InMemoryDataset

Wrapper for a PyTorch Geometric Dataset.

property processed_dir: str#
property processed_file_names: str | list[str] | tuple[str, ...]#

The processed file names

property raw_dir: str#
property raw_file_names: list[str]#

The raw file names

to(fmt: str)#

Convert the dataset to a different format.

l2gv2.datasets.cora module#

class l2gv2.datasets.cora.CoraDataset(root: str | None = None, **kwargs)#

Bases: BaseDataset

Cora dataset from the Planetoid dataset collection.

l2gv2.datasets.dgraph module#

class l2gv2.datasets.dgraph.DGraphDataset(root: str | None = None, source_file: str | None = None, **kwargs)#

Bases: BaseDataset

DGraph dataset DGraphXinye/DGraphFin_baseline

download()#

Download the DGraph dataset.

property raw_paths#

The absolute filepaths that must be present in order to skip downloading.

l2gv2.datasets.registry module#

Registry for datasets.

l2gv2.datasets.registry.get_dataset(name, **kwargs)#

Factory function to instantiate a dataset based on its registered name.

Parameters:
  • name (str) – The registered name of the dataset (e.g., “as-733”).

  • **kwargs – Additional keyword arguments that will be passed to the dataset’s constructor.

Returns:

An instance of the dataset class corresponding to the given name.

Raises:

ValueError – If the dataset name is not found in the registry.

l2gv2.datasets.registry.register_dataset(name)#

Decorator to register a dataset class under a specified name.

Parameters:

name (str) – The key under which the dataset class will be registered.

Returns:

A decorator that registers the class.

l2gv2.datasets.utils module#

Utilities for converting between different graph formats.

l2gv2.datasets.utils.polars_to_raphtory(edge_df: DataFrame, node_df: DataFrame = None) Graph#

Convert a Polars DataFrame to a Raphtory graph.

l2gv2.datasets.utils.polars_to_tg(edge_df: DataFrame, node_df: DataFrame = None, pre_transform: Callable | None = None) Tuple[Data, Dict[str, Tensor] | None]#

Convert a pair of Polars DataFrames (edge and node) to a list of PyTorch Geometric Data objects.

l2gv2.datasets.utils.tg_to_polars(data_list: List[Data]) Tuple[DataFrame, DataFrame]#

Convert a list of PyTorch Geometric Data objects to a Polars DataFrame.

Module contents#

class l2gv2.datasets.AS733Dataset(root: str | None = None, transform: Callable | None = None, pre_transform: Callable | None = None)#

Bases: BaseDataset

A PyTorch Dataset for the AS-733 dataset from SNAP.

download()#

Download the dataset tarball and extract it into the raw_dir.

process()#

Process the raw text files into a combined Polars DataFrame and save it as a Parquet file.

property processed_file_names: list[str]#

The processed file names for the AS-733 dataset.

property raw_file_names: list[str]#

The raw file names for the AS-733 dataset.

slices: Dict[str, Tensor] | None#
url = 'https://snap.stanford.edu/data/as-733.tar.gz'#
class l2gv2.datasets.BaseDataset(root: str | None = None, transform: Callable | None = None, pre_transform: Callable | None = None)#

Bases: InMemoryDataset

Wrapper for a PyTorch Geometric Dataset.

property processed_dir: str#
property processed_file_names: str | list[str] | tuple[str, ...]#

The processed file names

property raw_dir: str#
property raw_file_names: list[str]#

The raw file names

to(fmt: str)#

Convert the dataset to a different format.

class l2gv2.datasets.CoraDataset(root: str | None = None, **kwargs)#

Bases: BaseDataset

Cora dataset from the Planetoid dataset collection.

class l2gv2.datasets.DGraphDataset(root: str | None = None, source_file: str | None = None, **kwargs)#

Bases: BaseDataset

DGraph dataset DGraphXinye/DGraphFin_baseline

download()#

Download the DGraph dataset.

property raw_paths#

The absolute filepaths that must be present in order to skip downloading.

l2gv2.datasets.get_dataset(name, **kwargs)#

Factory function to instantiate a dataset based on its registered name.

Parameters:
  • name (str) – The registered name of the dataset (e.g., “as-733”).

  • **kwargs – Additional keyword arguments that will be passed to the dataset’s constructor.

Returns:

An instance of the dataset class corresponding to the given name.

Raises:

ValueError – If the dataset name is not found in the registry.

l2gv2.datasets.list_available_datasets()#

Returns a list of registered dataset names.

l2gv2.datasets.register_dataset(name)#

Decorator to register a dataset class under a specified name.

Parameters:

name (str) – The key under which the dataset class will be registered.

Returns:

A decorator that registers the class.