GFQL DateTime Filtering Examples#
This notebook shows how to filter graph data by dates and times using GFQL predicates.
Table of Contents#
Key Temporal Filtering Concepts:
Basic DateTime Filtering - Filter by specific dates and times
Date-Only Filtering - Ignore time components
Time-of-Day Filtering - Filter by time patterns
Complex Temporal Queries - Combine with other predicates
Temporal Value Classes - Explicit temporal objects
Timezone-Aware Filtering - Handle timezone conversions
Chain Operations - Multi-hop temporal queries
Wire Protocol Dicts - JSON-compatible configuration
Quick Reference: - gt()
, lt()
, ge()
, le()
- Greater/less than comparisons - between()
- Range queries - is_in()
- Match specific values - DateTimeValue
, DateValue
, TimeValue
- Explicit temporal types
[ ]:
# Standard Python datetime imports
import pandas as pd
import numpy as np
from datetime import datetime, date, time, timedelta
import pytz
# Graphistry imports
import graphistry
from graphistry import n, e_forward, e_reverse, e_undirected
# Temporal predicates
from graphistry.compute import (
gt, lt, ge, le, eq, ne, between, is_in,
DateTimeValue, DateValue, TimeValue
)
Setup: Create Sample Data#
Let’s create a sample dataset representing a transaction network with temporal data.
[2]:
# Generate sample transaction data
np.random.seed(42)
# Create nodes (accounts)
n_accounts = 100
accounts_df = pd.DataFrame({
'account_id': [f'ACC_{i:04d}' for i in range(n_accounts)],
'account_type': np.random.choice(['checking', 'savings', 'business'], n_accounts),
'created_date': pd.date_range('2020-01-01', periods=n_accounts, freq='W'),
'last_active': pd.date_range('2023-01-01', periods=n_accounts, freq='D') +
pd.to_timedelta(np.random.randint(0, 365, n_accounts), unit='D')
})
# Create edges (transactions)
n_transactions = 500
transactions_df = pd.DataFrame({
'transaction_id': [f'TXN_{i:06d}' for i in range(n_transactions)],
'source': np.random.choice(accounts_df['account_id'], n_transactions),
'target': np.random.choice(accounts_df['account_id'], n_transactions),
'amount': np.random.exponential(100, n_transactions).round(2),
'timestamp': pd.date_range('2023-01-01', periods=n_transactions, freq='H') +
pd.to_timedelta(np.random.randint(0, 8760, n_transactions), unit='H'),
'transaction_time': [time(np.random.randint(0, 24), np.random.randint(0, 60))
for _ in range(n_transactions)],
'transaction_type': np.random.choice(['transfer', 'payment', 'deposit'], n_transactions)
})
print(f"Created {len(accounts_df)} accounts and {len(transactions_df)} transactions")
print(f"\nTransaction date range: {transactions_df['timestamp'].min()} to {transactions_df['timestamp'].max()}")
[3]:
# Create graphistry instance
g = graphistry.edges(transactions_df, 'source', 'target').nodes(accounts_df, 'account_id')
print(f"Graph: {len(g._nodes)} nodes, {len(g._edges)} edges")
1. Basic DateTime Filtering#
Filter transactions based on datetime values using edge predicates.
[4]:
# Filter transactions after a specific date
# First, filter the edges directly
cutoff_date = datetime(2023, 7, 1)
recent_edges = g._edges[gt(pd.Timestamp(cutoff_date))(g._edges['timestamp'])]
recent_g = g.edges(recent_edges)
print(f"Transactions after {cutoff_date}: {len(recent_g._edges)}")
recent_g._edges[['transaction_id', 'timestamp', 'amount']].head()
[5]:
# Alternative: Use chain with edge operations
# Start from all nodes, then follow edges with temporal filter
recent_chain = g.chain([
n(), # Start with all nodes
e_forward({
"timestamp": gt(pd.Timestamp(cutoff_date))
})
])
print(f"Transactions after {cutoff_date} (chain): {len(recent_chain._edges)}")
[6]:
# Filter transactions in a specific month
march_edges = g._edges[
between(
datetime(2023, 3, 1),
datetime(2023, 3, 31, 23, 59, 59)
)(g._edges['timestamp'])
]
march_g = g.edges(march_edges)
print(f"Transactions in March 2023: {len(march_g._edges)}")
march_g._edges[['transaction_id', 'timestamp', 'amount']].head()
2. Date-Only Filtering#
Filter nodes based on dates, ignoring time components.
[7]:
# Filter accounts created after a specific date
new_accounts = g.chain([
n(filter_dict={
"created_date": ge(date(2021, 1, 1))
})
])
print(f"Accounts created after 2021: {len(new_accounts._nodes)}")
new_accounts._nodes[['account_id', 'created_date', 'account_type']].head()
[8]:
# Find accounts active in the last 90 days
ninety_days_ago = datetime.now().date() - timedelta(days=90)
active_accounts = g.chain([
n(filter_dict={
"last_active": gt(pd.Timestamp(ninety_days_ago))
})
])
print(f"Recently active accounts: {len(active_accounts._nodes)}")
3. Time-of-Day Filtering#
Filter transactions based on time of day.
[9]:
# Find transactions during business hours (9 AM - 5 PM)
business_hours_edges = g._edges[
between(
time(9, 0, 0),
time(17, 0, 0)
)(g._edges['transaction_time'])
]
business_hours_g = g.edges(business_hours_edges)
print(f"Business hour transactions: {len(business_hours_g._edges)}")
print(f"Percentage of total: {len(business_hours_g._edges) / len(g._edges) * 100:.1f}%")
[10]:
# Find transactions at specific times (e.g., on the hour)
on_the_hour_times = [time(h, 0, 0) for h in range(24)]
on_hour_edges = g._edges[
is_in(on_the_hour_times)(g._edges['transaction_time'])
]
on_hour_g = g.edges(on_hour_edges)
print(f"Transactions on the hour: {len(on_hour_g._edges)}")
4. Complex Temporal Queries#
Combine temporal predicates with other filters for complex queries.
[11]:
# Find large transactions (>$500) in Q4 2023
q4_mask = between(
datetime(2023, 10, 1),
datetime(2023, 12, 31, 23, 59, 59)
)(g._edges['timestamp'])
large_mask = gt(500)(g._edges['amount'])
q4_large_edges = g._edges[q4_mask & large_mask]
q4_large_g = g.edges(q4_large_edges)
print(f"Large Q4 2023 transactions: {len(q4_large_g._edges)}")
if len(q4_large_g._edges) > 0:
print(f"Total value: ${q4_large_g._edges['amount'].sum():,.2f}")
print(f"Average: ${q4_large_g._edges['amount'].mean():,.2f}")
[12]:
# Multi-hop query: Find accounts that received money recently
# and then sent money to business accounts
thirty_days_ago = datetime.now() - timedelta(days=30)
# First, find recent transactions
recent_edges = g._edges[gt(pd.Timestamp(thirty_days_ago))(g._edges['timestamp'])]
recent_g = g.edges(recent_edges)
# Use chain to find money flow pattern
money_flow = recent_g.chain([
# Start with any node
n(),
# Follow incoming edges (as destination)
e_reverse(),
# Go to source nodes
n(),
# Follow outgoing edges
e_forward(),
# To business accounts
n(filter_dict={"account_type": "business"})
])
print(f"Money flow pattern found: {len(money_flow._nodes)} business accounts")
5. Using Temporal Value Classes#
Use explicit temporal value classes for more control.
[13]:
# Create temporal values with specific properties
dt_value = DateTimeValue("2023-06-15T14:30:00", "UTC")
date_value = DateValue("2023-06-15")
time_value = TimeValue("14:30:00")
# Use in predicates
specific_edges = g._edges[gt(dt_value)(g._edges['timestamp'])]
specific_g = g.edges(specific_edges)
print(f"Transactions after {dt_value.value}: {len(specific_g._edges)}")
6. Timezone-Aware Filtering#
Handle timezone-aware datetime comparisons.
[14]:
# Add timezone info to our data for this example
transactions_df_tz = transactions_df.copy()
transactions_df_tz['timestamp_utc'] = pd.to_datetime(transactions_df_tz['timestamp']).dt.tz_localize('UTC')
transactions_df_tz['timestamp_eastern'] = transactions_df_tz['timestamp_utc'].dt.tz_convert('US/Eastern')
g_tz = graphistry.edges(transactions_df_tz, 'source', 'target')
# Filter using Eastern time
eastern = pytz.timezone('US/Eastern')
eastern_cutoff = eastern.localize(datetime(2023, 7, 1, 9, 0, 0)) # 9 AM Eastern
eastern_morning_edges = g_tz._edges[
gt(pd.Timestamp(eastern_cutoff))(g_tz._edges['timestamp_eastern'])
]
eastern_morning_g = g_tz.edges(eastern_morning_edges)
print(f"Transactions after 9 AM Eastern on July 1, 2023: {len(eastern_morning_g._edges)}")
7. Chain Operations with Temporal Edge Filters#
Demonstrate using temporal predicates in chain operations with proper edge filtering.
[15]:
# Find paths through recent high-value transactions
recent_high_value = g.chain([
# Start from all nodes
n(),
# Follow edges with temporal and amount filters
e_forward({
"timestamp": gt(datetime.now() - timedelta(days=7)),
"amount": gt(200)
}),
# Reach destination nodes
n()
])
print(f"Recent high-value transaction paths:")
print(f" Nodes: {len(recent_high_value._nodes)}")
print(f" Edges: {len(recent_high_value._edges)}")
[ ]:
# Wire protocol dicts in is_in predicates
# Useful for checking against multiple specific timestamps
important_dates = [
{"type": "datetime", "value": "2023-01-01T00:00:00", "timezone": "UTC"}, # New Year
{"type": "datetime", "value": "2023-07-04T00:00:00", "timezone": "UTC"}, # July 4th
{"type": "datetime", "value": "2023-12-25T00:00:00", "timezone": "UTC"}, # Christmas
]
# Note: This checks for exact timestamp matches
# For date matching, you'd need to extract the date portion
holiday_pred = is_in(important_dates)
# For demonstration, let's check if any transactions happened exactly at midnight on these days
# (In real data, you'd probably want to check date ranges instead)
print(f"Checking for transactions at midnight on holidays...")
print(f"(This is likely 0 unless transactions were specifically created at midnight)")
Summary#
This notebook demonstrated:
DateTime filtering with
gt
,lt
,between
predicates on edgesDate-only filtering for day-level granularity on nodes
Time-of-day filtering for patterns like business hours
Complex queries combining temporal and non-temporal predicates
Multi-hop queries with temporal constraints using chain operations
Temporal value classes for explicit control
Timezone-aware filtering
Wire protocol dictionaries for JSON-compatible predicate configuration
Proper chain syntax with edge filters in
e_forward()
and node filters inn()
Key takeaways: - Temporal predicates work seamlessly with pandas datetime types - Wire protocol dicts enable configuration-driven filtering: gt({"type": "datetime", "value": "2023-01-01T00:00:00", "timezone": "UTC"})
- Timezone awareness is built-in for accurate cross-timezone comparisons - Complex temporal patterns can be expressed through chain operations
Temporal predicates in GFQL provide a powerful way to analyze time-series aspects of graph data, enabling complex temporal queries while maintaining the expressiveness of graph traversals.
[ ]:
# Build predicates programmatically with wire protocol dicts
def create_date_filter(year, month, day, comparison="gt"):
"""Create a date filter using wire protocol format"""
date_dict = {
"type": "date",
"value": f"{year:04d}-{month:02d}-{day:02d}"
}
if comparison == "gt":
return gt(date_dict)
elif comparison == "lt":
return lt(date_dict)
elif comparison == "ge":
return ge(date_dict)
elif comparison == "le":
return le(date_dict)
else:
raise ValueError(f"Unknown comparison: {comparison}")
# Use the programmatic filter
filter_2023 = create_date_filter(2023, 1, 1, "ge")
accounts_2023 = g.chain([
n(filter_dict={
"created_date": filter_2023
})
])
print(f"Accounts created in 2023 or later: {len(accounts_2023._nodes)}")
[ ]:
# Example: Load predicate configuration from JSON
import json
# Simulate loading from a JSON config file
config_json = '''
{
"filters": {
"recent_transactions": {
"timestamp": {
"type": "gt",
"value": {
"type": "datetime",
"value": "2023-10-01T00:00:00",
"timezone": "UTC"
}
}
},
"business_hours": {
"transaction_time": {
"type": "between",
"start": {"type": "time", "value": "09:00:00"},
"end": {"type": "time", "value": "17:00:00"}
}
}
}
}
'''
config = json.loads(config_json)
# Use the wire protocol dict directly
recent_filter = config["filters"]["recent_transactions"]["timestamp"]["value"]
recent_edges = g._edges[gt(recent_filter)(g._edges['timestamp'])]
recent_g = g.edges(recent_edges)
print(f"Recent transactions (from JSON config): {len(recent_g._edges)}")
[ ]:
# Wire protocol dictionaries work directly in Python
# These are equivalent:
pred1 = gt(pd.Timestamp("2023-07-01"))
pred2 = gt({"type": "datetime", "value": "2023-07-01T00:00:00", "timezone": "UTC"})
# Test they produce the same results
result1 = pred1(g._edges['timestamp'])
result2 = pred2(g._edges['timestamp'])
print(f"Results are identical: {result1.equals(result2)}")
print(f"Transactions after July 1, 2023: {result1.sum()}")
8. Using Wire Protocol Dictionaries#
You can pass wire protocol dictionaries directly to temporal predicates. This is useful for: - Loading predicate configurations from JSON files - Building predicates programmatically - Sharing predicate definitions between systems
What’s Next?#
Datetime Filtering Guide - Full temporal predicate reference
Wire Protocol Reference - JSON serialization examples
GFQL Documentation - Complete GFQL reference
[16]:
# Complex multi-hop with temporal constraints
# Find 2-hop paths through recent transactions
two_hop_recent = g.chain([
# Start from business accounts
n(filter_dict={"account_type": "business"}),
# First hop: recent outgoing transactions
e_forward({
"timestamp": gt(datetime.now() - timedelta(days=30))
}, name="hop1"),
# Intermediate nodes
n(),
# Second hop: any transaction
e_forward(name="hop2"),
# Final nodes
n()
])
print(f"2-hop paths from business accounts through recent transactions:")
print(f" Total edges: {len(two_hop_recent._edges)}")
print(f" Hop 1 edges: {two_hop_recent._edges['hop1'].sum()}")
print(f" Hop 2 edges: {two_hop_recent._edges['hop2'].sum()}")
Summary#
This notebook demonstrated:
DateTime filtering with
gt
,lt
,between
predicates on edgesDate-only filtering for day-level granularity on nodes
Time-of-day filtering for patterns like business hours
Complex queries combining temporal and non-temporal predicates
Multi-hop queries with temporal constraints using chain operations
Temporal value classes for explicit control
Timezone-aware filtering
Proper chain syntax with edge filters in
e_forward()
and node filters inn()
Temporal predicates in GFQL provide a powerful way to analyze time-series aspects of graph data, enabling complex temporal queries while maintaining the expressiveness of graph traversals.