# Your first graph neural network: Detecting suspicious logins with link prediction

[Graphistry](http://github.com/graphistry/pygraphistry) - Leo Meyerovich, Alex Morrise, Tanmoy Sarkar

[Infosec Jupyterthon 2022](https://infosecjupyterthon.com/2022/agenda.html), December 2022

**Alert on & visualize anomalous identity events**
* Demo dataset: 1.6B windows events over 58 days => logins by 12K user over 14K systems
  * adapt to any identity system with logins
  * => Can we identify accounts & computers acting anomalously? Resources being oddly accessed?
  * => Can we spot the red team?
  * => Operations: Identity incident alerting + identity data investigations
  * Community/contact for help handling bigger-than-memory & additional features
* Techniques explored: Graph AI - 
  * RGCN (primary) - powerful with tweaking and in a pipeline
  * UMAP (secondary) - surprisingly effective with little tweaking
* Runs on both CPU + multi-GPU
* Tools: [PyGraphistry[AI]](http://github.com/graphistry/pygraphistry), [DGL](https://www.dgl.ai/) + [PyTorch](https://pytorch.org/), and [NVIDIA RAPIDS](https://rapids.ai/) / [umap-learn](https://github.com/lmcinnes/umap)

---


## 1. Graphs are awesome

- [Defenders think in lists, Attackers think in graphs. As long as this is true, attackers win.](https://github.com/JohnLaTwC/Shared/blob/master/Defenders%20think%20in%20lists.%20Attackers%20think%20in%20graphs.%20As%20long%20as%20this%20is%20true%2C%20attackers%20win.mdhttps://github.com/JohnLaTwC/Shared/blob/master/Defenders%20think%20in%20lists.%20Attackers%20think%20in%20graphs.%20As%20long%20as%20this%20is%20true%2C%20attackers%20win.md)
- Network graphs & event graphs & kill chains & ..: [Honeypot](https://hub.graphistry.com/graph/graph.html?dataset=7c2234aa98274bdbb460630a760d3a90&play=1000)
- **Today:** Two techniques for the graph AI era, focusing on **identity graphs**
* **=> Caught 96% of red team's logins (400+ out of millions) with only 10% FPs**
* Graph neural networks (GNNs) + UMAP

## 2. Graphs for identity data

Sample attacks
  * Fake account
  * Account takeover: Malware, credential stuffing, ...
  * Insider threat: Helpdesk, rogue admin, ...
  * Abnormal resource access patterns

Data & user activities (UEBA):
- Entity resolution: You, your assets, your contexts, ..
- Authentication
- Authorization
- ðŸ’°ðŸ’°ðŸ’° Did I mention zero-trust identity protection ? ðŸ’°ðŸ’°ðŸ’°ðŸ’°

Goals: Empower -
  * Identity detection
  * Identity investigation


## 3. AI era of graph: GNNs + UMAP
  - GNN's: [Science's Breakthrough of 2021](https://www.science.org/content/article/breakthrough-2021https://www.science.org/content/article/breakthrough-2021) - [example](https://news.mit.edu/sites/default/files/images/202112/MIT-Molecular-Shapes-01-press.jpg)
  - Combines network thinking (interesting connectivity) with tabular (time, $, etc. features)
  - Primitives: 
      * Classify nodes ("bot")
      * Predict links ("recommendation", "violation") <-- **TODAY**
      * Classify graphs ("motif mining")
  - Compose into tools:
      * Anomaly detection <-- **today**
      * abuse scoring
      * feeding into combined methods: today we're looking for graph shapes, but temporal cool too (RNN)
      * if model can do well at some task, good chance of reuse on other bits


## 4. RGCNs - Relational graph convolutional networks

[Twitter botnet example](https://hub.graphistry.com/graph/graph.html?dataset=Twitter&play=5000)

  * GNN - Graph neural network: Label prop 
    - "if all their friends are bots, ..."
    - multiple dimensions: bytes, region, ...
  * GCNs - Graph convolutional network: Multiple layers
    - "even if know little about them, but their friends.."
    - shallow!
  * RGCNs - Relational GCNs: 
    - multiple relationship types - follow vs block vs ...
    - ex: remote desktop vs regular login

Watch 2 youtube videos at end for theoretical intuitions 

## 5. Try it yourself

See:

* SSH logs RGCN anomaly detector in a few cells: [simple-ssh-logs-rgcn-anomaly-detector.ipynb](../../../more_examples/graphistry_features/embed/simple-ssh-logs-rgcn-anomaly-detector.ipynb)
* In-depth RGCN: [advanced-identity-protection-40m.ipynb](advanced-identity-protection-40m.ipynb)


## 6. Taking it to production

Watch the repo / contact to join us on:

  - Daily batch / real-time alerting => Splunk
  - Scaling & autonomous operation
  - Tuning: Time data, common FPs (new IPs, ..), ...
  - Use for correlation ID generation for investigation context (see tmw's UMAP talk!)

## Next steps
- SSH logs RGCN anomaly detector in a few cells: [simple-ssh-logs-rgcn-anomaly-detector.ipynb](../../../more_examples/graphistry_features/embed/simple-ssh-logs-rgcn-anomaly-detector.ipynb)
- In-depth RGCN: [advanced-identity-protection-40m.ipynb](advanced-identity-protection-40m.ipynb)
- UMAP demo for 97% alert volume reduction & alert correlation
- [PyGraphistry](http://github.com/graphistry/pygraphistryhttp://github.com/graphistry/pygraphistry) (py, oss) + [Graphistry Hub](https://hub.graphistry.com/https://hub.graphistry.com/) (free)
  - Dashboarding with [graph-app-kit (containerized, gpu, graph Streamlit)](https://github.com/graphistry/graph-app-kithttps://github.com/graphistry/graph-app-kit)
- Happy to help:
  - [Join our Slack](https://join.slack.com/t/graphistry-community/shared_invite/zt-53ik36w2-fpP0Ibjbk7IJuVFIRSnr6ghttps://join.slack.com/t/graphistry-community/shared_invite/zt-53ik36w2-fpP0Ibjbk7IJuVFIRSnr6g)
  - email and let's chat! info@graphistry.com


## Resource

* [PyGraphistry[AI]](http://github.com/graphistry/pygraphistryhttp://github.com/graphistry/pygraphistry)
* [What is graph intelligence](https://gradientflow.com/what-is-graph-intelligence/https://gradientflow.com/what-is-graph-intelligence/)
* GNN Videos:
  * GCN - https://www.youtube.com/watch?v=2KRAOZIULzw
  * RGCN - https://www.youtube.com/watch?v=wJQQFUcHO5U
  * Euler (combining RNN + GNN)- https://www.youtube.com/watch?v=1t124vguwJ8