l2gv2.datasets package#
Submodules#
l2gv2.datasets.as733 module#
AS-733 dataset from SNAP.
- class l2gv2.datasets.as733.AS733Dataset(root: str | None = None, transform: Callable | None = None, pre_transform: Callable | None = None)#
Bases:
BaseDatasetA PyTorch Dataset for the AS-733 dataset from SNAP.
- download()#
Download the dataset tarball and extract it into the raw_dir.
- process()#
Process the raw text files into a combined Polars DataFrame and save it as a Parquet file.
- property processed_file_names: list[str]#
The processed file names for the AS-733 dataset.
- property raw_file_names: list[str]#
The raw file names for the AS-733 dataset.
- url = 'https://snap.stanford.edu/data/as-733.tar.gz'#
l2gv2.datasets.base module#
Utilities for loading graph datasets.
The module provides a DataLoader class that can load graph datasets torch-geometric.data.Dataset and return a polars DataFrame of the edges and nodes. It contains methods to convert the graph into a raphtory graph and a networkx graph.
- class l2gv2.datasets.base.BaseDataset(root: str | None = None, transform: Callable | None = None, pre_transform: Callable | None = None)#
Bases:
InMemoryDatasetWrapper for a PyTorch Geometric Dataset.
- property processed_dir: str#
- property processed_file_names: str | list[str] | tuple[str, ...]#
The processed file names
- property raw_dir: str#
- property raw_file_names: list[str]#
The raw file names
- to(fmt: str)#
Convert the dataset to a different format.
l2gv2.datasets.cora module#
- class l2gv2.datasets.cora.CoraDataset(root: str | None = None, **kwargs)#
Bases:
BaseDatasetCora dataset from the Planetoid dataset collection.
l2gv2.datasets.dgraph module#
- class l2gv2.datasets.dgraph.DGraphDataset(root: str | None = None, source_file: str | None = None, **kwargs)#
Bases:
BaseDatasetDGraph dataset DGraphXinye/DGraphFin_baseline
- download()#
Download the DGraph dataset.
- property raw_paths#
The absolute filepaths that must be present in order to skip downloading.
l2gv2.datasets.registry module#
Registry for datasets.
- l2gv2.datasets.registry.get_dataset(name, **kwargs)#
Factory function to instantiate a dataset based on its registered name.
- Parameters:
name (str) – The registered name of the dataset (e.g., “as-733”).
**kwargs – Additional keyword arguments that will be passed to the dataset’s constructor.
- Returns:
An instance of the dataset class corresponding to the given name.
- Raises:
ValueError – If the dataset name is not found in the registry.
- l2gv2.datasets.registry.register_dataset(name)#
Decorator to register a dataset class under a specified name.
- Parameters:
name (str) – The key under which the dataset class will be registered.
- Returns:
A decorator that registers the class.
l2gv2.datasets.utils module#
Utilities for converting between different graph formats.
- l2gv2.datasets.utils.polars_to_raphtory(edge_df: DataFrame, node_df: DataFrame = None) Graph#
Convert a Polars DataFrame to a Raphtory graph.
- l2gv2.datasets.utils.polars_to_tg(edge_df: DataFrame, node_df: DataFrame = None, pre_transform: Callable | None = None) Tuple[Data, Dict[str, Tensor] | None]#
Convert a pair of Polars DataFrames (edge and node) to a list of PyTorch Geometric Data objects.
- l2gv2.datasets.utils.tg_to_polars(data_list: List[Data]) Tuple[DataFrame, DataFrame]#
Convert a list of PyTorch Geometric Data objects to a Polars DataFrame.
Module contents#
- class l2gv2.datasets.AS733Dataset(root: str | None = None, transform: Callable | None = None, pre_transform: Callable | None = None)#
Bases:
BaseDatasetA PyTorch Dataset for the AS-733 dataset from SNAP.
- download()#
Download the dataset tarball and extract it into the raw_dir.
- process()#
Process the raw text files into a combined Polars DataFrame and save it as a Parquet file.
- property processed_file_names: list[str]#
The processed file names for the AS-733 dataset.
- property raw_file_names: list[str]#
The raw file names for the AS-733 dataset.
- slices: Dict[str, Tensor] | None#
- url = 'https://snap.stanford.edu/data/as-733.tar.gz'#
- class l2gv2.datasets.BaseDataset(root: str | None = None, transform: Callable | None = None, pre_transform: Callable | None = None)#
Bases:
InMemoryDatasetWrapper for a PyTorch Geometric Dataset.
- property processed_dir: str#
- property processed_file_names: str | list[str] | tuple[str, ...]#
The processed file names
- property raw_dir: str#
- property raw_file_names: list[str]#
The raw file names
- to(fmt: str)#
Convert the dataset to a different format.
- class l2gv2.datasets.CoraDataset(root: str | None = None, **kwargs)#
Bases:
BaseDatasetCora dataset from the Planetoid dataset collection.
- class l2gv2.datasets.DGraphDataset(root: str | None = None, source_file: str | None = None, **kwargs)#
Bases:
BaseDatasetDGraph dataset DGraphXinye/DGraphFin_baseline
- download()#
Download the DGraph dataset.
- property raw_paths#
The absolute filepaths that must be present in order to skip downloading.
- l2gv2.datasets.get_dataset(name, **kwargs)#
Factory function to instantiate a dataset based on its registered name.
- Parameters:
name (str) – The registered name of the dataset (e.g., “as-733”).
**kwargs – Additional keyword arguments that will be passed to the dataset’s constructor.
- Returns:
An instance of the dataset class corresponding to the given name.
- Raises:
ValueError – If the dataset name is not found in the registry.
- l2gv2.datasets.list_available_datasets()#
Returns a list of registered dataset names.
- l2gv2.datasets.register_dataset(name)#
Decorator to register a dataset class under a specified name.
- Parameters:
name (str) – The key under which the dataset class will be registered.
- Returns:
A decorator that registers the class.