DGL Utils#
- class graphistry.dgl_utils.DGLGraphMixin(*args, **kwargs)#
Bases:
FeatureMixin
Automagic DGL models from Graphistry Instances.
- build_gnn(X_nodes=None, X_edges=None, y_nodes=None, y_edges=None, weight_column=None, reuse_if_existing=True, featurize_edges=True, use_node_scaler=None, use_node_scaler_target=None, use_edge_scaler=None, use_edge_scaler_target=None, train_split=0.8, device='cpu', inplace=False, *args, **kwargs)#
Builds GNN model using (DGL)[https://www.dgl.ai/]
Will auto-featurize, and if no explicit edges are found, automatically UMAP to produce implicit edges.#
- param X_nodes:
Which node dataframe columns to featurize. If None, will use all columns. If passing in explicit dataframe, will set them as attributes.
- param X_edges:
Which edge dataframe columns to featurize. If None, will use all columns. If passing in explicit dataframe, will set them as attributes.
- param y_nodes:
Optional target column from nodes dataframe.
- param y_edges:
Optional target column from edges dataframe
- param weight_column:
Optional Weight column if explicit edges table exists with said weights. Otherwise, weight_column is inhereted by UMAP.
- param train_split:
Randomly assigns a train and test mask according to the split value, default 80%.
- param use_node_scaler:
selects which scaling to use on featurized nodes dataframe. Default None
- param use_edge_scaler:
selects which scaling to use on featurized edges dataframe. Default None
- param device:
device to run model, default cpu, with gpu the other choice. Can be handled in outer scope.
- param inplace:
default, False, whether to return Graphistry instance in place or not.
- Parameters:
X_nodes (List[str] | str | DataFrame | None)
X_edges (List[str] | str | DataFrame | None)
y_nodes (List[str] | str | DataFrame | None)
y_edges (List[str] | str | DataFrame | None)
weight_column (str | None)
reuse_if_existing (bool)
featurize_edges (bool)
use_node_scaler (str | None)
use_node_scaler_target (str | None)
use_edge_scaler (str | None)
use_edge_scaler_target (str | None)
train_split (float)
device (str)
inplace (bool)
- convert_kwargs(*args, **kwargs)#
- dgl_lazy_init(train_split=0.8, device='cpu')#
Initialize DGL graph lazily :return:
- Parameters:
train_split (float)
device (str)
- graphistry.dgl_utils.convert_to_torch(X_enc, y_enc)#
Converts X, y to torch tensors compatible with ndata/edata of DGL graph
- Returns:
Dictionary of torch encoded arrays
- Parameters:
X_enc (DataFrame)
y_enc (DataFrame | None)
- graphistry.dgl_utils.get_available_devices()#
Get IDs of all available GPUs.
- Returns:
device (torch.device): Main device (GPU 0 or CPU). gpu_ids (list): List of IDs of all GPUs that are available.
- graphistry.dgl_utils.get_torch_train_test_mask(n, ratio=0.8)#
Generates random torch tensor mask
- Parameters:
n (int) – size of mask
ratio (float) – mimics train/test split. ratio sets number of True vs False mask entries.
- Returns:
train and test torch tensor masks
- graphistry.dgl_utils.pandas_to_dgl_graph(df, src, dst, weight_col=None, device='cpu')#
Turns an edge DataFrame with named src and dst nodes, to DGL graph :eg
g, sp_mat, ordered_nodes_dict = pandas_to_sparse_adjacency(df, ‘to_node’, ‘from_node’)
- Parameters:
df (DataFrame) – DataFrame with source and destination and optionally weight column
src (str) – source column of DataFrame for coo matrix
dst (str) – destination column of DataFrame for coo matrix
weight_col (str | None) – optional weight column when constructing coo matrix
device (str) – whether to put dgl graph on cpu or gpu
- Return type:
Tuple[dgl.DGLGraph, scipy.sparse.coo_matrix, Dict]
- :return
g: dgl graph sp_mat: sparse scipy matrix ordered_nodes_dict: dict ordered from most common src and dst nodes
- graphistry.dgl_utils.pandas_to_sparse_adjacency(df, src, dst, weight_col)#
Takes a Pandas Dataframe and named src and dst columns into a sparse adjacency matrix in COO format Needed for DGL utils
- Parameters:
df – edges dataframe
src – source column
dst – destination column
weight_col – optional weight column
- Returns:
COO sparse matrix, dictionary of src, dst nodes to index
- graphistry.dgl_utils.reindex_edgelist(df, src, dst)#
Since DGL needs integer contiguous node labels, this relabels as pre-processing step
- :eg
df, ordered_nodes_dict = reindex_edgelist(df, ‘to_node’, ‘from_node’) creates new columns given by config.SRC and config.DST
- Parameters:
df – edge dataFrame
src – source column of dataframe
dst – destination column of dataframe
- :returns
df, pandas DataFrame with new edges. ordered_nodes_dict, dict ordered from most common src and dst nodes.