DGL Utils#

class graphistry.dgl_utils.DGLGraphMixin(*args, **kwargs)#

Bases: FeatureMixin

Automagic DGL models from Graphistry Instances.

build_gnn(X_nodes=None, X_edges=None, y_nodes=None, y_edges=None, weight_column=None, reuse_if_existing=True, featurize_edges=True, use_node_scaler=None, use_node_scaler_target=None, use_edge_scaler=None, use_edge_scaler_target=None, train_split=0.8, device='cpu', inplace=False, *args, **kwargs)#

Builds GNN model using (DGL)[https://www.dgl.ai/]

Will auto-featurize, and if no explicit edges are found, automatically UMAP to produce implicit edges.#

param X_nodes:

Which node dataframe columns to featurize. If None, will use all columns. If passing in explicit dataframe, will set them as attributes.

param X_edges:

Which edge dataframe columns to featurize. If None, will use all columns. If passing in explicit dataframe, will set them as attributes.

param y_nodes:

Optional target column from nodes dataframe.

param y_edges:

Optional target column from edges dataframe

param weight_column:

Optional Weight column if explicit edges table exists with said weights. Otherwise, weight_column is inhereted by UMAP.

param train_split:

Randomly assigns a train and test mask according to the split value, default 80%.

param use_node_scaler:

selects which scaling to use on featurized nodes dataframe. Default None

param use_edge_scaler:

selects which scaling to use on featurized edges dataframe. Default None

param device:

device to run model, default cpu, with gpu the other choice. Can be handled in outer scope.

param inplace:

default, False, whether to return Graphistry instance in place or not.

Parameters:
  • X_nodes (List[str] | str | DataFrame | None)

  • X_edges (List[str] | str | DataFrame | None)

  • y_nodes (List[str] | str | DataFrame | None)

  • y_edges (List[str] | str | DataFrame | None)

  • weight_column (str | None)

  • reuse_if_existing (bool)

  • featurize_edges (bool)

  • use_node_scaler (str | None)

  • use_node_scaler_target (str | None)

  • use_edge_scaler (str | None)

  • use_edge_scaler_target (str | None)

  • train_split (float)

  • device (str)

  • inplace (bool)

convert_kwargs(*args, **kwargs)#
dgl_lazy_init(train_split=0.8, device='cpu')#

Initialize DGL graph lazily :return:

Parameters:
  • train_split (float)

  • device (str)

graphistry.dgl_utils.convert_to_torch(X_enc, y_enc)#

Converts X, y to torch tensors compatible with ndata/edata of DGL graph

Returns:

Dictionary of torch encoded arrays

Parameters:
  • X_enc (DataFrame)

  • y_enc (DataFrame | None)

graphistry.dgl_utils.get_available_devices()#

Get IDs of all available GPUs.

Returns:

device (torch.device): Main device (GPU 0 or CPU). gpu_ids (list): List of IDs of all GPUs that are available.

graphistry.dgl_utils.get_torch_train_test_mask(n, ratio=0.8)#

Generates random torch tensor mask

Parameters:
  • n (int) – size of mask

  • ratio (float) – mimics train/test split. ratio sets number of True vs False mask entries.

Returns:

train and test torch tensor masks

graphistry.dgl_utils.pandas_to_dgl_graph(df, src, dst, weight_col=None, device='cpu')#

Turns an edge DataFrame with named src and dst nodes, to DGL graph :eg

g, sp_mat, ordered_nodes_dict = pandas_to_sparse_adjacency(df, ‘to_node’, ‘from_node’)

Parameters:
  • df (DataFrame) – DataFrame with source and destination and optionally weight column

  • src (str) – source column of DataFrame for coo matrix

  • dst (str) – destination column of DataFrame for coo matrix

  • weight_col (str | None) – optional weight column when constructing coo matrix

  • device (str) – whether to put dgl graph on cpu or gpu

Return type:

Tuple[dgl.DGLGraph, scipy.sparse.coo_matrix, Dict]

:return

g: dgl graph sp_mat: sparse scipy matrix ordered_nodes_dict: dict ordered from most common src and dst nodes

graphistry.dgl_utils.pandas_to_sparse_adjacency(df, src, dst, weight_col)#

Takes a Pandas Dataframe and named src and dst columns into a sparse adjacency matrix in COO format Needed for DGL utils

Parameters:
  • df – edges dataframe

  • src – source column

  • dst – destination column

  • weight_col – optional weight column

Returns:

COO sparse matrix, dictionary of src, dst nodes to index

graphistry.dgl_utils.reindex_edgelist(df, src, dst)#

Since DGL needs integer contiguous node labels, this relabels as pre-processing step

:eg

df, ordered_nodes_dict = reindex_edgelist(df, ‘to_node’, ‘from_node’) creates new columns given by config.SRC and config.DST

Parameters:
  • df – edge dataFrame

  • src – source column of dataframe

  • dst – destination column of dataframe

:returns

df, pandas DataFrame with new edges. ordered_nodes_dict, dict ordered from most common src and dst nodes.