DGL Utils#

class graphistry.dgl_utils.DGLGraphMixin(*args, **kwargs)#

Bases: FeatureMixin

Automagic DGL models from Graphistry Instances.

build_gnn(X_nodes=None, X_edges=None, y_nodes=None, y_edges=None, weight_column=None, reuse_if_existing=True, featurize_edges=True, use_node_scaler=None, use_node_scaler_target=None, use_edge_scaler=None, use_edge_scaler_target=None, train_split=0.8, device='cpu', inplace=False, *args, **kwargs)#

Builds GNN model using (DGL)[https://www.dgl.ai/]

Will auto-featurize, and if no explicit edges are found, automatically UMAP to produce implicit edges.#

param X_nodes:: Which node dataframe columns to featurize. If None, will use all columns. If passing in explicit dataframe, will set them as attributes.
param X_edges:: Which edge dataframe columns to featurize. If None, will use all columns. If passing in explicit dataframe, will set them as attributes.
param y_nodes:: Optional target column from nodes dataframe.
param y_edges:: Optional target column from edges dataframe
param weight_column:: Optional Weight column if explicit edges table exists with said weights. Otherwise, weight_column is inhereted by UMAP.
param train_split:: Randomly assigns a train and test mask according to the split value, default 80%.
param use_node_scaler:: selects which scaling to use on featurized nodes dataframe. Default None
param use_edge_scaler:: selects which scaling to use on featurized edges dataframe. Default None
param device:: device to run model, default cpu, with gpu the other choice. Can be handled in outer scope.
param inplace:: default, False, whether to return Graphistry instance in place or not.

Parameters:

X_nodes (List[str] | str | DataFrame | None)
X_edges (List[str] | str | DataFrame | None)
y_nodes (List[str] | str | DataFrame | None)
y_edges (List[str] | str | DataFrame | None)
weight_column (str | None)
reuse_if_existing (bool)
featurize_edges (bool)
use_node_scaler (str | None)
use_node_scaler_target (str | None)
use_edge_scaler (str | None)
use_edge_scaler_target (str | None)
train_split (float)
device (str)
inplace (bool)

convert_kwargs(*args, **kwargs)#

dgl_lazy_init(train_split=0.8, device='cpu')#

Initialize DGL graph lazily :return:

Parameters:

train_split (float)
device (str)

graphistry.dgl_utils.convert_to_torch(X_enc, y_enc)#

Converts X, y to torch tensors compatible with ndata/edata of DGL graph

Returns:

Dictionary of torch encoded arrays

Parameters:

X_enc (DataFrame)
y_enc (DataFrame | None)

graphistry.dgl_utils.get_available_devices()#

Get IDs of all available GPUs.

Returns:: device (torch.device): Main device (GPU 0 or CPU). gpu_ids (list): List of IDs of all GPUs that are available.

graphistry.dgl_utils.get_torch_train_test_mask(n, ratio=0.8)#

Generates random torch tensor mask

Parameters:

n (int) – size of mask
ratio (float) – mimics train/test split. ratio sets number of True vs False mask entries.

Returns:

train and test torch tensor masks

graphistry.dgl_utils.pandas_to_dgl_graph(df, src, dst, weight_col=None, device='cpu')#

Turns an edge DataFrame with named src and dst nodes, to DGL graph :eg

g, sp_mat, ordered_nodes_dict = pandas_to_sparse_adjacency(df, ‘to_node’, ‘from_node’)

Parameters:

df (DataFrame) – DataFrame with source and destination and optionally weight column
src (str) – source column of DataFrame for coo matrix
dst (str) – destination column of DataFrame for coo matrix
weight_col (str | None) – optional weight column when constructing coo matrix
device (str) – whether to put dgl graph on cpu or gpu

Return type:

Tuple[dgl.DGLGraph, scipy.sparse.coo_matrix, Dict]

:return: g: dgl graph sp_mat: sparse scipy matrix ordered_nodes_dict: dict ordered from most common src and dst nodes

graphistry.dgl_utils.pandas_to_sparse_adjacency(df, src, dst, weight_col)#

Takes a Pandas Dataframe and named src and dst columns into a sparse adjacency matrix in COO format Needed for DGL utils

Parameters:

df – edges dataframe
src – source column
dst – destination column
weight_col – optional weight column

Returns:

COO sparse matrix, dictionary of src, dst nodes to index

graphistry.dgl_utils.reindex_edgelist(df, src, dst)#

Since DGL needs integer contiguous node labels, this relabels as pre-processing step

:eg: df, ordered_nodes_dict = reindex_edgelist(df, ‘to_node’, ‘from_node’) creates new columns given by config.SRC and config.DST

Parameters:

df – edge dataFrame
src – source column of dataframe
dst – destination column of dataframe

:returns: df, pandas DataFrame with new edges. ordered_nodes_dict, dict ordered from most common src and dst nodes.

DGL Utils

Contents

DGL Utils#

Will auto-featurize, and if no explicit edges are found, automatically UMAP to produce implicit edges.#